A Field Guide

Agentic Engineering & Harness Ownership

A reference companion to Andy "Dev Dan" Hennings' five-pillar talk on the top opportunity for senior engineers, with the Pi coding agent as the worked example of what harness ownership looks like in practice. Concepts only. No exercises. Read it once, return to it often.

Chapter 00·Foreword

Foreword #

This module is a faithful walk-through of a single 26-minute talk. The speaker compresses two weeks of unplugged thinking into one claim: the gap between low and high performing agentic engineers is not the model and not the agent product, it is the system around the agent. Five pillars define that system. The talk argues that owning your agent harness sits at the root of all five.

Throughout the module we treat the Pi coding agent (pi.dev) as the running illustration. Pi is the harness the speaker uses every day. We cite it not as an endorsement but because it is the most concrete public artifact of "extensible by design," and looking at the artifact makes the principles legible.

How this is organized

Each pillar gets its own chapter. Inside each chapter: the core claim, the underlying principle, why it matters, common pitfalls, and a primary-source pointer when the chapter touches material outside the talk. A short glossary and a primary-source index sit at the back.

A note on voice

The speaker is plain-spoken and frames the talk as a "message to myself." This wiki keeps that register. When the talk overstates, we say so. When it understates, we say so. The point is to leave you with calibrated beliefs, not slogans.

Chapter 01·The opportunity

The top opportunity for senior engineers #

The opportunity has been the same for over a year: agentic engineering. What changed is its size and its proximity to becoming the default.

Core claim

"By the end of 2026, agentic engineering will be the default." The talk argues the window for being early is closing because Andrej Karpathy named the field directly at the Sequoia AI Ascent, and when Karpathy names a thing the industry follows. Translation: the term has captured shared vocabulary and the practice will follow.

Two engineers, same agent, different outcomes

The talk's recurring framing: two engineers using the exact same agent with 200k tokens get massively different results. The difference is not the agent. It is the surrounding system. The rest of the module is a taxonomy of that system.

The five pillars, in one line each

Agent harness: The runtime the agent lives in. Owning it is leverage. Renting it is a ceiling.
Software factory: The system that builds the system. Build factories, not features.
Extensible software: Open to extension, closed to modification. Adaptability is the survival trait when models and tools move weekly.
Always-on agents: AFK agents that work while you sleep. Only useful after you have proven the token arbitrage.
Agentic access: APIs, CLIs, RPC, webhooks. Agents only command what they can programmatically reach. Anything else is a token tax.

Why these five and not others

The speaker explicitly omits "models" from the list. The reasoning: for 80 to 90 percent of daily work, models matter less than the systems around them. Models are a bleeding-edge concern. Harness, factory, extensibility, always-on, and access are compounding concerns. They keep paying after the model du jour is replaced.

Where the term comes from

The Karpathy reference is to the Sequoia AI Ascent stage, where he framed "agentic engineering" as a discipline distinct from prompt-tuning and from traditional software engineering. Treat that talk as the canonical naming event.

Chapter 02·Pillar one

Agent harness #

Whoever controls the agent harness controls your results. The harness is the runtime the agent inhabits, and the runtime decides what is even possible.

"Cloud code, Codex, OpenCode. These tools are fantastic. They were a great start. They are a terrible place to finish." — Chapter 2 of the talk

Definition

An agent harness is the program that hosts an LLM-driven loop. It owns the system prompt, the tool registry, the context window strategy, the I/O channels, the permission model, and the lifecycle of every sub-process the agent spawns. Two agents using identical model weights but different harnesses are not the same product.

Why ownership is the leverage point

The speaker's argument is straightforward. The agent gives you the speed. The agent lives in the harness. Therefore the harness gates everything: which models you can swap to, which tools you can give the agent, whether you can build a sandbox, whether you can run multi-agent orchestration, whether you can add a verifier loop. If you rent the harness, every one of those gates is somebody else's decision.

Two classes of custom harnesses

The talk distinguishes between general-purpose and domain-specific harnesses. Both are valuable. The second is where most engineers leave money on the table.

Engineering-pattern harnesses: General-purpose customizations of the loop itself: multi-agent teams, plan-then-act chains, verifier harnesses that have one agent check another agent's work.
Domain-specific harnesses: One thing done extraordinarily well. A DevOps harness. A testing harness. A billing harness. Specialization is the moat.

What the talk's UIA J Team example actually shows

The speaker demos a three-tier orchestration system built on top of his harness: an orchestrator at the top, team leads in the middle, workers at the bottom, all communicating through a chat-room interface. The point is not the demo. The point is that this shape of system is impossible with a default agent product because the host you are renting does not expose the primitives required.

A renter cannot specialize

The strong form of the claim: if you do not own the harness, you cannot build a domain-specific agent. You can only configure within someone else's frame. For one-off tasks this is fine. For a durable advantage it is not.

What "owning" actually means

Ownership is not "you wrote it from scratch." It means you can change the system prompt, swap providers and models mid-session, add or remove tools, layer in permission gates, control compaction, intercept and rewrite messages, and ship those changes to your team without waiting on a vendor release cycle. Pi is one example of a harness designed to make this kind of ownership cheap (pi.dev, source on GitHub).

Common pitfalls

Confusing customization with ownership: keystroke bindings and a settings.json are not a harness. Building a harness in isolation: the leverage shows up when the harness is reused across many projects, not on the first one. Treating the harness as a personal toy: if it doesn't ship to your team, the moat is one person wide.

Want a concrete example?

See Chapter 08 (case study) for a worked example of what harness ownership actually unlocks: a four-tool extension that turns Pi into a peer-to-peer agent network, demonstrated on a PII-safe production-to-dev workflow and a feature-parity build between two cloud sandboxes.

Chapter 03·Pillar two

Software factory #

Build factories, not features. The unit of engineering work shifts from "the next feature" to "the system of agents and code that produces features on spec, every time."

Core claim

You move your focus into the system that builds the system. The output per unit of time goes parabolic because one prompt invokes a factory that plans, scouts, validates, builds, tests, and reviews on your behalf.

"A plan is a prompt scaled. That's all a plan is. It's a more detailed prompt." — Chapter 3 of the talk

Anatomy of a factory

The talk sketches a pipeline rather than a single step. Each stage is a teachable, templatable workflow.

Plan / spec. The plan prompt is the formula for how engineering work is described. It is the first place the factory shows up.
Plan review. A second pass over the plan, often by a different agent, before any code is written.
Scouting. Locating the right files, modules, and dependencies the change will touch.
Validation. Constraint checks against the spec before execution.
Build. Actually producing the change.
Test. The factory always runs tests. No exceptions.
Review. A reviewing agent, or a staging environment, or a regression-fixing team of agents.

Two names you will see in the wild

ADW — AI Developer Workflow: The speaker's preferred term in his "Tactical Agentic Coding" course. An ADW combines agents plus deterministic code to outperform either alone.
Dark factory: The industry term for the same idea: an engineering pipeline that runs without human-in-the-loop on the critical path. Borrowed from manufacturing's "lights-out" factory.

The mindset shift

"You are not the engineer that builds the feature. You are the engineer that builds the system of AI plus code that operates on your behalf." This is hard. The talk concedes it. Most engineers' identity is welded to shipping features. Untangling that takes deliberate practice.

Where the ceiling is

The speaker introduces ZTE — Zero Touch Engineering as the asymptote: prompt directly to production. He flags it as super advanced and out of scope. The honest framing: you do not need ZTE to win. You need a factory that takes you from prompt to "near production" reliably. ZTE is the limit point, not the entry bar.

Honest caveat

"Parabolic output per unit of time" is a marketing line. What is defensible is: a working factory makes a class of repeatable work much cheaper and more consistent, and it frees the human for work that does not repeat. The leverage is real. The growth curve depends on how much of your work is repeatable.

Manufacturing analog

The factory metaphor is borrowed deliberately. For background on how repeatability and tolerances drove industrial output, see Henry Ford's moving assembly line in 1913 and Taiichi Ohno's Toyota Production System. The agentic translation is identical: standardize the process, instrument every stage, fix defects at the station they appear.

Chapter 04·Pillar three

Extensible software #

When models change weekly and tools change daily, brittle software is a liability. Pluggability, composability, and "open to extension, closed to modification" are survival traits.

Core claim

The pace of change is the dominant variable. Models release. Tools release. Prompts evolve. The best response is not to predict; it is to build software that absorbs change without breaking. The speaker frames this as one of two ideas he personally underweighted earlier in his agentic engineering work. The other was the harness.

"Open to extension, closed to modification." — the Open-Closed Principle, restated for the agentic era

Two surfaces where extensibility pays

Engineering surface: Your harness, your factory, your dev tooling. The win is being able to swap a model, slot in a new tool, or test a new prompt without a rewrite.
Product surface: The software you ship. AI involvement is incidental. The same principle applies: when the rate of change is high, code that adds is cheaper than code that modifies.

What "extensible" looks like in practice

The Pi coding agent is the talk's running example of an extensible harness: extensions are TypeScript modules with access to tools, commands, keyboard shortcuts, events, and the full TUI. Sub-agents, plan mode, permission gates, and sandboxes are not baked in. They are extensions that ship as packages and install from npm or git (Pi extensions docs). The architectural decision is to ship primitives, not features.

Primitives over features

Pi explicitly chooses not to ship MCP, sub-agents, plan mode, permission popups, built-in to-dos, or background bash. Each can be added as an extension. The cost is that you do more configuration. The benefit is that the system survives the next pivot in agent tooling without an internal rewrite.

Why this is harder than it sounds

"Build pluggable software" is easy to say. The hidden tax is interface design: every extension point is a contract you now have to maintain. Done well, this is a deep module with a small surface and large internal complexity (the Ousterhout ideal). Done badly, it is a brittle plugin system whose every change breaks downstream.

The "vibe coding trash" trap

The talk's framing: if you are generating slop and shipping tech debt, extensibility will not save you. Extensibility presumes a deliberate interface boundary. Generated code without that boundary is just more code to maintain, faster.

Where the principle comes from

"Open to extension, closed to modification" was articulated by Bertrand Meyer in Object-Oriented Software Construction (1988) and popularized as the "O" in the SOLID principles. The agentic-era restatement adds: extension points must include the model, the tool registry, and the context strategy, not just the type hierarchy.

Chapter 05·Pillar four

Always-on agents (AFK agents) #

Always-on is the ceiling, not the entry move. You earn the right to run agents 24/7 by first proving that the tokens you spend create value you can capture.

Core claim

Anyone can spin up an agent in a while-loop. That is "token maxing" and it is the floor. The high move is to turn on agents only after you have verified the token economics. The discipline is in not turning them on prematurely.

Tokenomics in three levels

The talk lays out a three-stage funnel. Each stage gates the next.

Level	Behavior	State you want to leave
1. Token max	Use more tokens.	Spend without measuring value.
2. Useful tokens	Make those tokens valuable.	Value generated but not captured.
3. Revenue capture	Convert value to revenue or measurable outcome.	This is where you turn the agent always-on.

The arbitrage

The unit economics in the speaker's framing: buy a token for one dollar, run it through your business process, produce two dollars of value, capture the difference. Once that loop closes, scale it. This is the same logic that drives ad spend in any growth-stage company. The novelty is that the input good is compute.

"Your rising API bill becomes a productivity KPI. But only after you get out of level one and level two." — Chapter 5 of the talk

What "useful" actually means

A useful token is one that contributes to an outcome someone will pay for, in cash or in time saved. The talk is blunt about the failure mode: a million crontab-driven agents are running right now and 90 percent of them are dead-useless and burning cash. The diagnostic is whether you can trace each agent run to a value-bearing artifact.

Premature always-on is expensive

The natural impulse is to turn things on the moment they work. Resist. Validate the arbitrage on a small loop first. Always-on is a force multiplier in both directions: it multiplies your wins and your waste.

What the speaker's own token usage looks like

He claims his token growth is a "very smooth curve" because he refuses to scale anything before the value-capture step. Treat this as a calibration: high-performing agentic engineers are often not the highest-token-spend engineers. They are the ones whose tokens convert.

Adjacent thinking

The arbitrage framing borrows from classic unit economics. For background, Bill Gurley's essay on LTV math is useful, and Andrew Chen's "Law of Shitty Clickthroughs" explains why arbitrages erode and need to be re-found.

Chapter 06·Pillar five

Agentic access #

Agents only command what they can programmatically reach. Anything you do by hand that an agent could do via API is a tax you pay in tokens, time, and consistency.

Core claim

API access is a requirement of agentic speed. CLIs, REST endpoints, webhooks, RPC clients. If the agent cannot get there, the agent cannot help. The diagnostic question the talk insists on: "If an agent could do this and isn't, why not?"

"Agents only command what they can programmatically reach." — Chapter 6 of the talk

The token tax, defined

A token tax is any work an agent does inefficiently because you have not given it direct API access. The agent burns tokens scraping, parsing, retrying, or asking the human to do the thing manually, all because the tool surface was missing. The remedy is investment in tool surfaces, not investment in better prompts.

Where to look first

Codebases and repos: agents need git, gh, build, lint, and test as first-class tools.
Products you operate: every admin action you can do in a UI should also be reachable via API.
Devices and infrastructure: deploys, restarts, log queries, metric pulls.
Internal data: search, query, and writes against your own systems of record.

Where to NOT give access

The talk is explicit. You do not give production access by default. You do not give an agent permission to nuke databases, volumes, or shared infrastructure. The bash tool gets locked down. Agentic access is not the same as agentic carelessness. The point is to remove unjustified friction, not to remove justified guardrails.

How this connects back to the harness

An extensible harness is what makes selective access cheap. In Pi, for example, access is granted through extensions that wire tools, plus permission gates and protected paths that wrap them (permission-gate.ts, protected-paths.ts). The same harness that grants access also enforces the boundary. Without that, access becomes binary and unsafe.

Adjacent reading

The principle "agents only command what they can reach" rhymes with the Unix philosophy of small tools wired together. See Doug McIlroy's notes on building blocks and the original Ritchie and Thompson CACM paper. The agent is the new shell. Your tools are the new pipeline.

Chapter 07·Synthesis

The compound effect #

The five pillars are not a checklist. They compose. Each one increases the leverage of the others. Owning the harness makes the factory possible. The factory makes always-on safe. Extensibility keeps both from rotting. Agentic access removes the friction that prevents either from running at agent speed.

How the pillars stack

If you have...	You unlock...
Harness ownership	The ability to build a custom factory and to wire selective access.
Software factory	Repeatable, on-spec output you can trust enough to leave running.
Extensible software	The factory survives model and tool changes without a rewrite.
Always-on agents	Productive output during hours you are not at the keyboard.
Agentic access	Each pillar runs at agent speed, not human speed.

The speaker's final framing

"Vibe coding is the lowest hanging fruit. Do not sit in the terminal prompting out your features. Build the software factory. Own the agent harness. Make your products extensible. Learn to arbitrage your tokens. Expose your CLIs and APIs everywhere."

The honest one-line summary

If you remember nothing else: the agent is the engine, the harness is the chassis, the factory is the assembly line, extensibility is the maintainability of the line, always-on is the night shift, and agentic access is the loading dock. A car plant without any one of those is not a car plant.

What the talk explicitly does not say

It does not name a model. By design. Models are a bleeding-edge concern. The pillars are not.
It does not promise that any one tool is best. Pi is the example; the principles do not require Pi.
It does not say this is easy. It says the opposite: this is a software engineering skill that takes deliberate practice.

Chapter 08·Case study

Pi-to-Pi: agent-to-agent communication #

A worked example of what harness ownership unlocks: two (or more) Pi agents that talk to each other as peers, on the same device or across the network, with no orchestrator. Drawn from the second talk in the series, "Pi to Pi Agent Communication."

Why this is in the wiki

Chapter 02 made the abstract case for owning the harness. This chapter is the concrete one. The pattern below is impossible inside a rented agent product. It is trivial inside a harness you control.

The thesis in one line

"What is better than one Pi agent? Two Pi agents that actually work together." The point is not the number. The point is the topology. Most multi-agent systems today are top-down: an orchestrator delegates to workers, information flows one way. Pi-to-Pi inverts that: every agent is a peer, every channel is bidirectional, and the best information wins regardless of which agent had it.

Four communication topologies, in order of expressiveness

The talk lays these out as a progression. Each topology is a real pattern with real uses; the higher tiers do not deprecate the lower ones.

Topology	Direction	Typical use
Sub-agent delegation	Parent → child (one-way, scoped)	"Do this subtask and report back." The current default.
Message queue / broker	Hub-and-spoke through a broker	Coordinated parallel work where one agent owns the queue. (The pattern Claude Code's "agent teams" uses.)
Agent chain (deterministic)	Pipeline with code between nodes	AI Developer Workflows. Adds determinism by inserting code at each handoff.
Peer-to-peer (bidirectional)	Any agent ↔ any agent	Flat coordination. The new ground Pi-to-Pi opens up.

Why flat beats hierarchical

The argument leans on a familiar observation from organizational design: in any hierarchy, the best information is usually at the bottom (the people doing the work), and it dies on the way up because it lacks title or authority. Flat structures let valuable information win on its merits. The talk cites Nvidia's famously flat reporting structure and startups generally as examples. The agentic analog: in a delegation tree, the worker agent often has the best context but no channel to share it laterally. Peer-to-peer gives it one.

The four-tool protocol

There is "basically no magic" here. The whole system is four tools exposed to each agent:

list: Enumerate the other agents currently on the network.
send: Send a prompt to a named peer. Returns a message ID.
await: Blocking wait on a specific message ID for the peer's reply.
check: Non-blocking poll. Use when an agent should keep working while a peer thinks.

That is the entire surface. Two flavors of the extension ship in the speaker's public repo: comms (single-device, in-process pool) and comms-net (a lightweight Bun HTTP server that lets agents connect across machines). Both are deliberately simple. The recommendation is to read the code, then have your own agent adapt it for your security and topology requirements.

"Read and adapt," not "install and forget"

The shipping extensions are reference implementations. The talk is explicit: secure them and harden them for your specific use case. Anything that accepts network requests from other agents is a permission boundary you need to take seriously.

Demo 1: PII-safe production-to-dev workflow

The first demo is a routine engineering problem with a real constraint: a Pro-tier user is hitting a lockout bug in production; the fix requires reproducing it locally; production contains personally identifiable information that must not leak. Setup:

Prod agent on a Mac mini hosting the production database. Knows the schema, knows what is PII, will not expose it.
Dev agent on a laptop. Job: bring the affected slice over with PII stripped so an engineer can reproduce locally.

The dev agent sees the prod peer on the network, sends a request for the affected slice, awaits the message ID. The prod agent fetches, redacts, and replies. They iterate until the dev agent confirms a clean repro. No orchestrator. No human in the loop on the back-and-forth. The boundary is enforced by the prod agent's own instructions and by the harness, not by a separate access-control layer.

Demo 2: feature-parity between two cloud sandboxes

The second demo is a research-and-build task: produce a new Pi skill for exe.dev that matches the feature surface of an existing E2B skill. Setup:

E2B agent: holds the canonical E2B skill, has refreshed its context on every feature, command, and quirk. Answers questions from its peer.
exe.dev agent: drives the build. Reads the exe.dev docs, asks the E2B agent targeted questions to confirm parity, produces the new skill plus a feature-parity document that flags differences.

The result the talk highlights: ten corrections came out of the exchange — claims the E2B agent could have made wrong if it had been working alone, caught because the peer agent was actively validating them. This is the verifier pattern applied laterally between peers, not vertically from supervisor to worker.

The deeper point about context

You could put both tasks in one agent. The talk argues you should not. A focused context window is a more reliable context window. Two agents at 200k each, each focused on one tool, outperform one agent at 400k spanning both. The lesson holds independently of Pi: do not let context grow larger than the smallest task that fits inside it.

Pros and cons, stated honestly

The talk ends each pattern with a trade-off section. This is the one for peer-to-peer.

Pros

It is just an agent. No new runtime, no orchestrator process, no resume mechanism. Boot a Pi instance, install the extension, it joins the pool.

End-to-end customizable. You own the protocol because you own the harness.

Flat by construction. No information loss in the chain of command, because there is no chain.

Primitive over composition. Once you have one agent, you can compose any number. Composition is an engineering pattern; primitives are what make it cheap.

Cons

You build it. Or fork the speaker's reference. Either way you own the prompts, the context engineering, and the edge cases.

Loops are possible. Sloppy prompts produce sloppy back-and-forths and burn tokens. Define an end state.

Cost scales linearly. Agent count plus communication bounce. There is a useful upper bound; past it, more agents stop helping.

Easy to slip back into orchestration. If you find one peer doing all the directing, you have an orchestrator with extra steps. That is fine if it is what you need; just be honest about it.

How this connects back to the five pillars

Harness ownership (Pillar 1): the entire pattern is unavailable inside a rented product. Owning Pi means you can add a four-tool extension and it works.
Software factory (Pillar 2): peer-to-peer is a topology for the factory floor. Specialized peers replace a single overloaded worker.
Extensible software (Pillar 3): the comms layer ships as an extension, not a core change. The same harness that runs single-agent runs the network pool.
Always-on agents (Pillar 4): a verifier peer always listening for messages is a low-cost AFK agent that earns its tokens.
Agentic access (Pillar 5): the network is now an API the agent reaches over. Other agents become a tool surface.

Source notes

The reference extensions live in the speaker's "Pi vs Cloud Code" codebase (linked from his channel; see agenticengineer.com). Pi itself is at pi.dev. For the sandbox tools used in Demo 2: E2B and exe.dev. The "verifier pattern" referenced in passing is documented in the speaker's prior video on validator agents (linked from his channel).

One pull-quote to take with you

"The tool you use limits what you believe is possible. With the Pi agent harness, I see no limits." — Chapter on pros and cons of Pi-to-Pi communication

That is overstatement on purpose. The honest read: a harness you can extend in an afternoon expands the space of patterns you will even try. Most engineers never try peer-to-peer because their tool does not let them.

Chapter 09·Reference

The Pi coding agent #

Pi is the talk's worked example of an extensible, ownable harness. This chapter is a structured reference, not a tutorial. For the full surface area see the Pi docs and the source on GitHub.

What Pi is

Pi is a minimal terminal coding harness built by Earendil Inc. (lead author: Mario Zechner). The tagline on pi.dev is "There are many agent harnesses, but this one is yours." The thesis: ship primitives, not features. Anything Pi does not include can be built as an extension or installed from a third-party package.

Why it shows up in the talk

Pi is the speaker's daily driver and the reason he can claim to be "building one new custom agent harness every single day." A composable harness reduces the cost of a custom harness from "fork the product" to "write an extension."

Surface area, in one page

Modes: Interactive TUI; print/JSON for scripts (pi -p "query"); RPC over stdin/stdout for non-Node integrations; SDK for embedding in apps.
Providers and models: 15+ providers, hundreds of models. Anthropic, OpenAI, Google, Azure, Bedrock, Mistral, Groq, Cerebras, xAI, Hugging Face, Kimi For Coding, MiniMax, OpenRouter, Ollama. Mid-session model switch with /model or Ctrl+L. Custom providers via models.json.
Sessions: Tree-structured. Navigate with /tree. Export with /export. Share via /share to a gist-backed URL. Example session.
Context engineering: Minimal system prompt by design. Project instructions via AGENTS.md. Per-project override via SYSTEM.md. Customizable compaction. Skills (on-demand capability packages). Prompt templates (reusable Markdown prompts, invoked with /name). Dynamic context injection via extensions.
Extensions: TypeScript modules with access to tools, commands, keyboard shortcuts, events, and the full TUI. 50+ examples in the repo.
Steering: Enter sends a steering message (delivered after the current tool, interrupts the rest). Alt+Enter queues a follow-up that waits until the agent finishes.

What Pi explicitly does not include

From the homepage, deliberately omitted features and the recommended workaround for each:

No MCP. Build CLI tools with READMEs (Pi's "skills"), or add MCP via an extension.
No sub-agents. Spawn Pi instances via tmux, or build via an extension.
No permission popups. Run in a container, or build a confirmation flow via an extension.
No plan mode. Write plans to files, or build with an extension.
No built-in to-dos. Use a TODO.md file.
No background bash. Use tmux for full observability and direct interaction.

A note on the name

The talk pronounces it "Pi" (as in π). It is sometimes written informally as "py" in transcripts because of the sound. The package is @earendil-works/pi-coding-agent on npm. The domain is pi.dev.

Install

From the homepage, four supported invocations:

curl -fsSL https://pi.dev/install.sh | sh

Or via npm, pnpm, bun, or PowerShell. See the docs for current instructions.

Where to go for more

The design rationale lives in Mario Zechner's launch post and his "What if you don't need MCP?" essay. Community is on Discord. Package directory at pi.dev/packages. License is MIT.

Chapter 10·Deep dive

Harness architecture from first principles #

Strip away the branding. An agent harness is six pieces wired together: a model client, a tool registry, a context strategy, a message store, an extension bus, and a UI surface. This chapter defines each piece in terms that do not depend on Pi. The next chapter shows how Pi instantiates them.

The agent loop, as an algorithm

Every coding agent runs the same loop. Naming and storage differ; the shape does not.

// The universal agent loop
async function agentTurn(state: AgentState): Promise<AgentState> {
  while (true) {
    // 1. Build the context the model will see
    const messages = state.contextStrategy.build(state.session, state.systemPrompt);

    // 2. Call the model with available tools
    const response = await state.model.complete({
      messages,
      tools: state.tools.activeSchemas(),
      signal: state.abortSignal,
    });

    // 3. Persist the assistant message
    state.session.append({ role: "assistant", content: response.content, usage: response.usage });

    // 4. If no tool calls, we are done
    const toolCalls = response.content.filter(c => c.type === "toolCall");
    if (toolCalls.length === 0) return state;

    // 5. Execute each tool call (after preflight hooks)
    for (const call of toolCalls) {
      const blockResult = await state.hooks.fire("tool_call", call);
      if (blockResult?.block) {
        state.session.append({ role: "toolResult", toolCallId: call.id,
          content: [{type:"text", text: blockResult.reason}], isError: true });
        continue;
      }
      const result = await state.tools.execute(call, state.abortSignal);
      const patched = await state.hooks.fire("tool_result", result) ?? result;
      state.session.append({ role: "toolResult", toolCallId: call.id, ...patched });
    }

    // 6. Loop back: assistant will likely respond to tool results
  }
}

Read it twice. Everything else in this manual is structure around this loop. The model is the engine, the loop is the crankshaft, the rest is gearing.

The six pieces in detail

1. Model client

A typed wrapper over one provider's HTTP API. It accepts a normalized message array and a tool schema, returns a stream of content blocks (text, thinking, toolCall) plus token usage. The minimum surface:

interface ModelClient {
  readonly provider: string;
  readonly id: string;
  readonly contextWindow: number;
  readonly capabilities: { reasoning: boolean; vision: boolean; toolUse: boolean };

  complete(args: {
    messages: NormalizedMessage[];
    tools: ToolSchema[];
    systemPrompt?: string;
    thinkingLevel?: ThinkingLevel;
    signal?: AbortSignal;
  }): AsyncIterable<StreamEvent>;
}

Pi separates this into an API kind (anthropic-messages, openai-completions, openai-responses, etc.) and a provider (Anthropic, OpenAI, OpenRouter, Bedrock, Ollama, ...). Providers register models; models pick an API kind. This is why Pi supports 15+ providers without 15+ adapter rewrites: there are only ~5 wire formats.

2. Tool registry

A dictionary of named functions exposed to the model, each with a JSONSchema-typed parameter set and an executor.

interface ToolDefinition<P> {
  name: string;             // canonical name (lowercase, snake_case)
  label: string;            // human label for UI
  description: string;      // shown to the model
  parameters: JSONSchema;   // validated before execute()
  execute(
    toolCallId: string,
    params: P,
    signal: AbortSignal,
    onUpdate?: (partial: ToolResult) => void,  // streaming progress
    ctx?: ToolContext
  ): Promise<ToolResult>;
}

interface ToolResult {
  content: ContentBlock[];   // text or image
  details?: unknown;         // arbitrary metadata, not sent to LLM
  isError?: boolean;
}

Two design choices matter. First, the parameter schema goes to the model verbatim — the model decides what arguments to send based on the schema's description fields. Vague schemas produce vague calls. Second, details is for the UI and for downstream extensions; the LLM only sees content.

3. Context strategy

A pure function that takes the current session and produces the message list the model will see. The naive version is "return all messages." The realistic version handles compaction, branch summaries, system-prompt assembly, and tool-result truncation.

interface ContextStrategy {
  build(session: SessionStore, systemPrompt: string): NormalizedMessage[];
  estimateTokens(messages: NormalizedMessage[], model: ModelClient): number;
  shouldCompact(used: number, window: number, reserve: number): boolean;
}

Pi's default reserves 16,384 tokens for the response, keeps the most recent ~20,000 tokens of conversation verbatim, summarizes the rest into a CompactionEntry, and rebuilds the context from [system, summary, kept...]. See the compaction docs for the exact algorithm. Chapter 12 of this wiki walks through it.

4. Message store (session)

An append-only log of typed entries. Entries have parent pointers so the log is actually a tree — branching is a first-class operation, not a fork-the-file workaround. Pi stores it as JSONL with one entry per line; reconstruction is a single pass.

interface SessionEntry {
  type: string;            // "message" | "compaction" | "model_change" | ...
  id: string;              // 8-char hex
  parentId: string | null; // null for root
  timestamp: string;       // ISO
}

interface SessionStore {
  append(entry: Omit<SessionEntry, "id" | "parentId" | "timestamp">): string;
  getLeafId(): string;
  getEntry(id: string): SessionEntry | undefined;
  getBranch(fromId?: string): SessionEntry[];   // root → leaf
  branch(toEntryId: string): void;              // move leaf back
  getChildren(parentId: string): SessionEntry[];
}

The session is the source of truth for everything you can replay: model changes, tool calls, compactions, even extension state. Chapter 12 documents Pi's entry types in full.

5. Extension bus (hooks)

A typed pub/sub layered over the loop. Extensions subscribe to lifecycle events; the loop awaits handlers and respects their return values. The contract every harness eventually converges on:

type Hook =
  | "session_start" | "session_shutdown"
  | "before_agent_start" | "agent_start" | "agent_end"
  | "turn_start" | "turn_end"
  | "context"                      // mutate messages before send
  | "before_provider_request"      // mutate raw provider payload
  | "after_provider_response"      // inspect HTTP response
  | "tool_call"                    // block or mutate input
  | "tool_result"                  // mutate output
  | "user_bash"                    // intercept ! and !! commands
  | "input"                        // intercept user input
  | "model_select" | "thinking_level_select"
  | "session_before_compact" | "session_compact"
  | "session_before_tree" | "session_tree"
  | "session_before_fork" | "session_before_switch";

interface HookBus {
  on<E extends Hook>(event: E, handler: HookHandler<E>): Disposable;
  fire<E extends Hook>(event: E, payload: HookPayload<E>): Promise<HookResult<E>>;
}

This is the architectural lever that makes harness ownership cheap. Adding a new behavior is "subscribe to one hook and return a value" rather than "fork the loop."

6. UI surface

The terminal is the canonical Pi target, but the abstraction is wider: a UI surface is anything that can show messages, accept input, render tool calls, and prompt the user for confirmation. Pi exposes four UI modes — interactive TUI, print/JSON for scripts, RPC for subprocess clients, and an SDK for embedding — all served by the same loop, the same session store, and the same extensions.

The whole architecture in one paragraph

A session is a tree of entries on disk. A model client streams content from a provider. A tool registry exposes typed functions to the model. A context strategy decides what slice of the session goes into each model call. An extension bus lets you intercept every step. A UI surface renders the loop to a human or a program. The agent loop wires all six together. That is the entire harness.

Where Pi made specific choices

Piece	Pi's choice	Rationale (from the docs)
Model client	One `api` string per provider (anthropic-messages, openai-completions, ...). Custom providers via `pi.registerProvider()`.	Most providers map onto ~5 wire formats. Treat the wire format as the abstraction, the provider as configuration.
Tool registry	JSONSchema via TypeBox. Tools defined with `defineTool()`; extensions register at any time via `pi.registerTool()`.	Schemas are part of the prompt the model sees. TypeBox gives you static types and runtime validation from one definition.
Context strategy	Reserve 16,384 for response; keep 20,000 most recent; summarize the rest. Customizable per project, replaceable via extension.	Default that works; escape hatch that does not require forking.
Message store	JSONL tree with 8-char hex IDs and `parentId` links. Versioned (currently v3).	Append-only is robust. Trees enable in-place branching without copying files.
Extension bus	30+ typed events. Handlers chain in load order. Some can block or mutate.	Cover every interesting decision point with a hook so the core never has to know about the feature you want to add.
UI surface	Interactive TUI, print/JSON, RPC over stdin/stdout JSONL, SDK for embedding.	Four shapes is enough to cover human terminals, shell scripts, language-agnostic clients, and same-process embedding.

If you internalize one thing

The agent loop is small. The session is small. The model client is small. The size of a useful harness comes from the hooks, because hooks are where features that other tools bake in become things you compose. This is the architectural translation of "primitives over features."

Chapter 11·Deep dive

The Pi extension API #

Every behavior the talk attributes to harness ownership eventually reduces to writing one of these. This chapter is the full reference, drawn from the Pi extensions docs, with types and the events that matter most.

The minimum extension

A Pi extension is a TypeScript module with a default-exported factory. Pi loads it via jiti, so no compile step. The factory receives ExtensionAPI; that is the entire injection.

// ~/.pi/agent/extensions/hello.ts
import type { ExtensionAPI } from "@earendil-works/pi-coding-agent";

export default function (pi: ExtensionAPI) {
  pi.on("session_start", async (_event, ctx) => {
    ctx.ui.notify("Extension loaded!", "info");
  });
}

Save the file. Pi auto-discovers it on next launch. To run with an extension without installing globally:

pi -e ./hello.ts

Discovery: where Pi looks

Path	Scope
`~/.pi/agent/extensions/*.ts`	Global, all projects
`~/.pi/agent/extensions/*/index.ts`	Global, multi-file extensions
`.pi/extensions/*.ts`	Project-local, checked into git
`settings.json` → `packages: ["npm:..."]`	Shared via npm / git
`--extension path` CLI flag	One-off without installing

Async factories for setup work

If the factory returns a Promise, Pi awaits it before session_start fires. Use this to fetch remote configuration or discover models, so they are available immediately (including to pi --list-models).

export default async function (pi: ExtensionAPI) {
  const r = await fetch("http://localhost:1234/v1/models");
  const { data } = (await r.json()) as { data: Array<{id: string; context_window?: number}> };

  pi.registerProvider("local-openai", {
    baseUrl: "http://localhost:1234/v1",
    apiKey: "LOCAL_OPENAI_API_KEY",
    api: "openai-completions",
    models: data.map(m => ({
      id: m.id, name: m.id, reasoning: false, input: ["text"],
      cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
      contextWindow: m.context_window ?? 128000, maxTokens: 4096,
    })),
  });
}

The event lifecycle

From the Pi docs, the order of events around a single user prompt:

pi starts
  ├─► session_start { reason: "startup" }
  └─► resources_discover { reason: "startup" }

user sends prompt
  ├─► (extension commands checked first, bypass loop if found)
  ├─► input  (intercept / transform / handle)
  ├─► (skill and template expansion if not handled)
  ├─► before_agent_start  (inject message, modify system prompt)
  ├─► agent_start
  │
  │   ┌─── turn (repeats while LLM calls tools) ───────────┐
  │   │ turn_start                                         │
  │   │ context                       (mutate messages)    │
  │   │ before_provider_request       (replace payload)    │
  │   │ after_provider_response       (inspect headers)    │
  │   │ message_start / message_update / message_end       │
  │   │   tool_execution_start                             │
  │   │   tool_call                   (BLOCK or mutate)    │
  │   │   tool_execution_update                            │
  │   │   tool_result                 (mutate output)      │
  │   │   tool_execution_end                               │
  │   │ turn_end                                           │
  │   └────────────────────────────────────────────────────┘
  └─► agent_end

Three properties matter. (1) Handlers run in extension load order. (2) Mutations chain — later handlers see earlier handlers' changes. (3) Some events block; some can return a replacement payload; most are notification-only. The return value's effect is part of the event's contract, not a global rule.

The high-value hooks, with signatures

`tool_call` — preflight, block, or mutate

Fired after tool_execution_start, before the tool runs. The handler can mutate event.input in place (later handlers and the tool itself see the mutation, no re-validation) and can return { block: true, reason } to short-circuit.

import { isToolCallEventType } from "@earendil-works/pi-coding-agent";

pi.on("tool_call", async (event, ctx) => {
  if (isToolCallEventType("bash", event)) {
    // Add a profile sourcing prefix to every shell command
    event.input.command = `source ~/.profile\n${event.input.command}`;

    if (/\brm\s+-rf\b/.test(event.input.command)) {
      const ok = await ctx.ui.confirm("Dangerous!", "Allow rm -rf?");
      if (!ok) return { block: true, reason: "User declined" };
    }
  }
});

`tool_result` — middleware over outputs

Fired after the tool returns and before the result message is appended to the session. Handlers chain like middleware; each sees the latest patched result.

import { isBashToolResult } from "@earendil-works/pi-coding-agent";

pi.on("tool_result", async (event, ctx) => {
  if (!isBashToolResult(event)) return;
  // Send to a redaction service before the LLM sees it
  const r = await fetch("https://redactor.internal/scrub", {
    method: "POST",
    body: JSON.stringify({ content: event.content }),
    signal: ctx.signal,
  });
  const { content } = await r.json();
  return { content };  // partial patch; details / isError unchanged
});

`before_agent_start` — modify system prompt or inject a message

pi.on("before_agent_start", (event, ctx) => {
  const current = ctx.getSystemPrompt();
  return { systemPrompt: current + "\n\nNever modify files in /etc." };
});

`context` — last-mile message mutation

pi.on("context", (event, ctx) => {
  // event.messages is the array about to go to the model
  // Mutate or replace
  return { messages: event.messages.filter(m => !shouldHide(m)) };
});

`input` — intercept user text before processing

pi.on("input", async (event, ctx) => {
  if (event.text.startsWith("?quick ")) {
    return { action: "transform", text: `Respond briefly: ${event.text.slice(7)}` };
  }
  if (event.text === "ping") {
    ctx.ui.notify("pong", "info");
    return { action: "handled" };   // skip agent entirely
  }
  return { action: "continue" };
});

`session_before_compact` — custom compaction

import { convertToLlm, serializeConversation } from "@earendil-works/pi-coding-agent";

pi.on("session_before_compact", async (event, ctx) => {
  const { preparation, signal } = event;
  const text = serializeConversation(convertToLlm(preparation.messagesToSummarize));
  const summary = await myCustomModel.summarize(text, { signal });
  return {
    compaction: {
      summary,
      firstKeptEntryId: preparation.firstKeptEntryId,
      tokensBefore: preparation.tokensBefore,
    }
  };
});

ExtensionAPI methods, by purpose

Register things

pi.registerTool(definition)        // LLM-callable tool, schema via TypeBox
pi.registerCommand(name, options)  // Slash command: /name
pi.registerShortcut(keys, options) // Keyboard shortcut
pi.registerFlag(name, options)     // CLI flag, read via pi.getFlag(name)
pi.registerProvider(name, config)  // Model provider (with OAuth optional)
pi.registerMessageRenderer(type, renderer) // Custom TUI rendering

Talk to the agent

pi.sendMessage(message, options?)       // Inject custom message into session
pi.sendUserMessage(content, options?)   // Send a user message (triggers turn)
pi.appendEntry(customType, data?)       // Persist extension state (no LLM context)
pi.setSessionName(name)                 // Display name for /resume
pi.setLabel(entryId, label?)            // Bookmark/marker on an entry

Inspect or control the runtime

pi.getActiveTools() / pi.getAllTools() / pi.setActiveTools(names)
pi.setModel(model) / pi.setThinkingLevel(level)
pi.getCommands()
pi.exec(command, args, options?)        // Run a shell command (typed result)
pi.events.on / pi.events.emit           // Shared event bus for extension ↔ extension

Custom tools: a complete example

import { Type, type Static } from "typebox";

const greetSchema = Type.Object({
  name: Type.String({ description: "Name to greet" }),
  enthusiasm: Type.Optional(Type.Integer({ minimum: 0, maximum: 5, default: 1 })),
});
export type GreetInput = Static<typeof greetSchema>;

pi.registerTool({
  name: "greet",
  label: "Greet",
  description: "Greet someone by name with controllable enthusiasm",
  parameters: greetSchema,
  promptSnippet: "Greet a person, optionally with extra enthusiasm",
  promptGuidelines: [
    "Use greet when the user explicitly asks for a salutation.",
    "Use greet with enthusiasm=3 or higher only when the user signals it.",
  ],
  async execute(toolCallId, params, signal, onUpdate, ctx) {
    onUpdate?.({ content: [{ type: "text", text: "Composing greeting..." }] });
    const bangs = "!".repeat(params.enthusiasm ?? 1);
    return {
      content: [{ type: "text", text: `Hello, ${params.name}${bangs}` }],
      details: { name: params.name, enthusiasm: params.enthusiasm },
    };
  },
});

Note: promptSnippet opts the tool into the system prompt's "Available tools" section; promptGuidelines appends bullets to the "Guidelines" section. Guidelines are merged flat across all tools, so always name the tool in the guideline text ("Use greet when...", never "Use this tool when...").

Custom commands with autocomplete

import type { AutocompleteItem } from "@earendil-works/pi-tui";

pi.registerCommand("deploy", {
  description: "Deploy to an environment",
  getArgumentCompletions: (prefix: string): AutocompleteItem[] | null => {
    const envs = ["dev", "staging", "prod"];
    const items = envs
      .filter(e => e.startsWith(prefix))
      .map(value => ({ value, label: value }));
    return items.length ? items : null;
  },
  handler: async (args, ctx) => {
    await ctx.waitForIdle();
    ctx.ui.notify(`Deploying: ${args}`, "info");
    // ctx.fork / ctx.newSession / ctx.switchSession / ctx.navigateTree available here
  },
});

State persistence

Two places to put state. Use tool_result.details if the state belongs to a specific tool invocation (this gives you correct behavior across branches and forks). Use pi.appendEntry(customType, data) for opaque extension state that you want to survive restarts. Recover on session_start by walking entries and filtering on customType.

Extensions run with your full system permissions

An extension is arbitrary TypeScript. Review every third-party extension you install. Use Pi's permission-gate and protected-paths examples as a baseline for sandboxing dangerous tools. The bash tool especially should be wrapped on any machine that touches production assets.

Primary sources

Extensions docs · 50+ example extensions · Keybindings · Themes

Chapter 12·Deep dive

Sessions, compaction, and the tree #

A Pi session is a JSONL file whose entries form a tree. Everything you can undo, branch, summarize, or replay lives there. This chapter is the file format, the algorithm that builds the model's context from the tree, and the compaction strategy that keeps long conversations within the window.

File layout

Sessions live at:

~/.pi/agent/sessions/--<path>--/<timestamp>_<uuid>.jsonl

where <path> is the working directory with / replaced by -. One file per session. Append-only on disk. The first line is the header; every subsequent line is an entry with a typed payload.

Header

// Version 3 header (current)
{ "type": "session", "version": 3, "id": "uuid",
  "timestamp": "2024-12-03T14:00:00.000Z", "cwd": "/path/to/project" }
// Optional: parentSession when created via /fork or /clone
{ ... , "parentSession": "/path/to/original/session.jsonl" }

Versions: v1 was linear, v2 introduced the tree, v3 renamed the hookMessage role to custom for extension unification. Older sessions auto-migrate on load.

Entry shape

interface SessionEntryBase {
  type: string;            // "message" | "compaction" | "branch_summary" | ...
  id: string;              // 8-char hex
  parentId: string | null; // null for the first entry after the header
  timestamp: string;       // ISO 8601
}

Entry types in production

Type	What it carries	In LLM context?
`message`	A user, assistant, toolResult, bashExecution, custom, branchSummary, or compactionSummary message	Yes (depending on subtype)
`model_change`	Provider + modelId at the moment the user switched	No (state only)
`thinking_level_change`	New thinking level	No
`compaction`	`summary`, `firstKeptEntryId`, `tokensBefore`, optional `details`	Summary is, original messages aren't
`branch_summary`	Summary of an abandoned branch, with `fromId` back-reference	Yes, injected at navigation point
`custom`	Extension state. `customType` identifies the extension. `data` is arbitrary JSON	No
`custom_message`	Extension-injected message that DOES go in context	Yes
`label`	User bookmark on another entry	No
`session_info`	Display name for the session (latest wins)	No

Two message types worth knowing

// Assistant content can mix text, thinking, and tool calls
interface AssistantMessage {
  role: "assistant";
  content: (TextContent | ThinkingContent | ToolCall)[];
  api: string; provider: string; model: string;
  usage: Usage;                  // tokens + cost
  stopReason: "stop" | "length" | "toolUse" | "error" | "aborted";
  timestamp: number;
}

// Bash executions from `!` commands sit in their own message type
interface BashExecutionMessage {
  role: "bashExecution";
  command: string; output: string;
  exitCode: number | undefined;
  cancelled: boolean; truncated: boolean;
  fullOutputPath?: string;       // when output overflowed
  excludeFromContext?: boolean;  // true for `!!` prefix
  timestamp: number;
}

How a tree becomes a context

buildSessionContext() is the function that walks from the current leaf to the root and produces the message list the model sees. The algorithm:

function buildSessionContext(session: SessionStore, systemPrompt: string) {
  const path = session.getBranch();   // [root, ..., leaf]
  const out: NormalizedMessage[] = [{ role: "system", content: systemPrompt }];

  // 1. If the path contains a compaction, find the most recent one
  const lastCompaction = [...path].reverse().find(e => e.type === "compaction");
  if (lastCompaction) {
    out.push({ role: "user", content: `<summary>\n${lastCompaction.summary}\n</summary>` });
    // Then include messages from firstKeptEntryId forward
    const keepFrom = path.findIndex(e => e.id === lastCompaction.firstKeptEntryId);
    for (const e of path.slice(keepFrom)) appendIfMessage(out, e);
  } else {
    for (const e of path) appendIfMessage(out, e);
  }

  // 2. Convert BranchSummaryEntry and CustomMessageEntry into proper messages
  return out;
}

"Branching" never duplicates the file. To branch from an earlier entry, set the leaf back to that entry's id and append. The old branch still exists, just no longer on the active path. SessionManager.branch(entryId) does this; SessionManager.createBranchedSession(leafId) extracts a branch into a new file when you actually want to detach it.

Compaction in detail

Pi triggers compaction when

contextTokens > contextWindow - reserveTokens

with reserveTokens defaulting to 16,384 (configurable). You can also trigger it manually with /compact [instructions].

The algorithm:

Walk backwards from the leaf, accumulating estimated tokens until keepRecentTokens (default 20,000) is reached. That's the cut point.
Collect everything earlier on the active path back to the previous compaction's firstKeptEntryId (or the start). Those are the messages to summarize.
Call the model with a structured summary prompt (Goal / Constraints / Progress / Key Decisions / Next Steps / Critical Context + tagged <read-files> and <modified-files>).
Append a CompactionEntry with the summary, the kept-from id, and the pre-compaction token count.
Rebuild context from the summary plus messages after the cut. The original earlier messages remain in the JSONL file but are no longer in context.

Cut-point rules: cut only at user, assistant, bashExecution, or custom messages. Never cut at a toolResult (it must stay paired with its call). Long single turns ("split turns") are handled by summarizing the early part of the turn separately and merging the two summaries.

// CompactionEntry as it appears on disk
{
  "type": "compaction",
  "id": "f6g7h8i9",
  "parentId": "e5f6g7h8",
  "timestamp": "2024-12-03T14:10:00.000Z",
  "summary": "## Goal\nUser wants to refactor auth...\n",
  "firstKeptEntryId": "c3d4e5f6",
  "tokensBefore": 50000,
  "details": { "readFiles": [...], "modifiedFiles": [...] }
}

Branch summaries

When you navigate with /tree to a different branch, Pi offers to summarize what you are leaving behind so that context travels with you. Same summary format as compaction; the entry is branch_summary with a fromId pointing at the old leaf. File operations (read + modified) accumulate across nested branch summaries and compactions.

The structured summary format

## Goal
[What the user is trying to accomplish]

## Constraints & Preferences
- [Requirements mentioned by user]

## Progress
### Done
- [x] [Completed tasks]

### In Progress
- [ ] [Current work]

### Blocked
- [Issues, if any]

## Key Decisions
- **[Decision]**: [Rationale]

## Next Steps
1. [What should happen next]

## Critical Context
- [Data needed to continue]

<read-files>
path/to/file1.ts
</read-files>

<modified-files>
path/to/changed.ts
</modified-files>

Tool results are truncated to 2,000 characters during message serialization before summarization (long bash and read outputs would otherwise dominate the summary's token budget). The structured headings keep the model from treating the summary as a conversation to continue.

SessionManager API surface

// Construction (static)
SessionManager.create(cwd, sessionDir?)
SessionManager.open(path, sessionDir?)
SessionManager.continueRecent(cwd, sessionDir?)
SessionManager.inMemory(cwd?)
SessionManager.forkFrom(sourcePath, targetCwd, sessionDir?)
SessionManager.list(cwd, sessionDir?, onProgress?)
SessionManager.listAll(onProgress?)

// Instance: navigation
sm.getLeafId() / sm.getLeafEntry() / sm.getEntry(id)
sm.getBranch(fromId?)   // path root → entry
sm.getTree() / sm.getChildren(parentId)
sm.branch(entryId)      // move leaf back
sm.branchWithSummary(entryId, summary, details?, fromHook?)
sm.createBranchedSession(leafId)  // extract to new file

// Instance: append (all return entry ID)
sm.appendMessage(message)
sm.appendModelChange(provider, modelId)
sm.appendThinkingLevelChange(level)
sm.appendCompaction(summary, firstKeptEntryId, tokensBefore, details?, fromHook?)
sm.appendCustomEntry(customType, data?)      // state, not in context
sm.appendCustomMessageEntry(customType, content, display, details?)  // in context
sm.appendLabelChange(targetId, label)
sm.appendSessionInfo(name)

// Instance: build the context the model sees
sm.buildSessionContext()

Why JSONL and not a database

The session is the unit of replay. Anything that can be expressed as "append a typed line" is forward-compatible and recoverable from a partial write. A database adds schema migrations, locking, and a binary dump every time something changes. JSONL gives you tail -f and jq as debugging tools out of the box.

Primary sources

Session format · Compaction · Source: session-manager.ts, compaction.ts.

Chapter 13·Deep dive

Programmatic surfaces: SDK, RPC, JSON #

Pi exposes four ways to drive the agent: the interactive TUI, the SDK (same Node process), RPC over stdin/stdout (subprocess), and a one-shot JSON event stream. Each is the same loop wearing a different jacket. This chapter is the reference for the three non-interactive ones.

Choosing a surface

Surface	Use when	Process model
SDK	You're in Node/TS and want type safety, direct state access	In-process
RPC	Driving from another language, need process isolation	Subprocess, JSONL on stdin/stdout
JSON event stream	One-shot prompts piped into scripts	Subprocess, output only
Interactive TUI	Humans at a terminal	Same loop, terminal UI

SDK: the canonical entry point

import {
  AuthStorage, createAgentSession, ModelRegistry, SessionManager
} from "@earendil-works/pi-coding-agent";

const authStorage = AuthStorage.create();
const modelRegistry = ModelRegistry.create(authStorage);

const { session } = await createAgentSession({
  sessionManager: SessionManager.inMemory(),
  authStorage,
  modelRegistry,
});

session.subscribe(event => {
  if (event.type === "message_update"
      && event.assistantMessageEvent.type === "text_delta") {
    process.stdout.write(event.assistantMessageEvent.delta);
  }
});

await session.prompt("What files are in the current directory?");

The `AgentSession` contract

interface AgentSession {
  // Send / queue prompts
  prompt(text: string, options?: PromptOptions): Promise<void>;
  steer(text: string): Promise<void>;     // delivered after current tool
  followUp(text: string): Promise<void>;  // delivered when agent stops

  // Observe
  subscribe(listener: (event: AgentSessionEvent) => void): () => void;
  readonly messages: AgentMessage[];
  readonly isStreaming: boolean;

  // Model state
  setModel(model: Model): Promise<void>;
  setThinkingLevel(level: ThinkingLevel): void;
  cycleModel(): Promise<ModelCycleResult | undefined>;

  // Tree navigation within the current session file
  navigateTree(targetId: string, options?: {
    summarize?: boolean; customInstructions?: string;
    replaceInstructions?: boolean; label?: string;
  }): Promise<{ editorText?: string; cancelled: boolean }>;

  // Context engineering
  compact(customInstructions?: string): Promise<CompactionResult>;
  abortCompaction(): void;
  abort(): Promise<void>;
  dispose(): void;
}

The event vocabulary you'll subscribe to

type AgentSessionEvent =
  // Lifecycle
  | { type: "agent_start" }
  | { type: "agent_end"; messages: AgentMessage[] }
  | { type: "turn_start" }
  | { type: "turn_end"; message: AgentMessage; toolResults: ToolResultMessage[] }
  // Message lifecycle
  | { type: "message_start"; message: AgentMessage }
  | { type: "message_update"; message: AgentMessage; assistantMessageEvent: AssistantMessageEvent }
  | { type: "message_end"; message: AgentMessage }
  // Tool execution
  | { type: "tool_execution_start"; toolCallId: string; toolName: string; args: unknown }
  | { type: "tool_execution_update"; toolCallId: string; toolName: string; args: unknown; partialResult: unknown }
  | { type: "tool_execution_end"; toolCallId: string; toolName: string; result: unknown; isError: boolean }
  // Session
  | { type: "queue_update"; steering: readonly string[]; followUp: readonly string[] }
  | { type: "compaction_start"; reason: "manual" | "threshold" | "overflow" }
  | { type: "compaction_end"; reason: ...; result: CompactionResult | undefined; aborted: boolean; willRetry: boolean }
  | { type: "auto_retry_start"; attempt: number; maxAttempts: number; delayMs: number; errorMessage: string }
  | { type: "auto_retry_end"; success: boolean; attempt: number; finalError?: string };

Defining tools at the SDK layer

import { Type } from "typebox";
import { defineTool, createAgentSession } from "@earendil-works/pi-coding-agent";

const statusTool = defineTool({
  name: "status",
  label: "Status",
  description: "Get system status",
  parameters: Type.Object({}),
  execute: async () => ({
    content: [{ type: "text", text: `Uptime: ${process.uptime()}s` }],
    details: {},
  }),
});

const { session } = await createAgentSession({
  tools: ["read", "bash", "status"],  // include built-ins + custom
  customTools: [statusTool],
});

Built-in tools: read, bash, edit, write, grep, find, ls. Default set: the first four. Pass noTools: "all" to disable everything, noTools: "builtin" to keep only extension and custom tools.

RPC: JSONL over stdin/stdout

Start with pi --mode rpc. Commands go in (one JSON object per line, LF-only — Node's readline is not protocol-compliant because it also splits on U+2028/U+2029). Events come out. Each command may include an id for correlation; the corresponding response echoes the same id.

Command shapes (selection)

// Send / queue
{"type":"prompt","id":"req-1","message":"Hello"}
{"type":"prompt","message":"Stop and do this","streamingBehavior":"steer"}
{"type":"prompt","message":"After you're done","streamingBehavior":"followUp"}
{"type":"steer","message":"..."}
{"type":"follow_up","message":"..."}
{"type":"abort"}

// State
{"type":"get_state"}
{"type":"get_messages"}
{"type":"get_session_stats"}

// Model
{"type":"set_model","provider":"anthropic","modelId":"claude-sonnet-4-20250514"}
{"type":"cycle_model"}
{"type":"set_thinking_level","level":"high"}

// Compaction / retry
{"type":"compact","customInstructions":"Focus on code changes"}
{"type":"set_auto_compaction","enabled":true}
{"type":"set_auto_retry","enabled":true}

// Session tree
{"type":"new_session"}
{"type":"switch_session","sessionPath":"/path/to/session.jsonl"}
{"type":"fork","entryId":"abc123"}
{"type":"clone"}
{"type":"set_session_name","name":"refactor-auth"}

// Bash through Pi (output is added to LLM context on the NEXT prompt)
{"type":"bash","command":"ls -la"}

Response shape

// success
{"id":"req-1","type":"response","command":"prompt","success":true}

// with data
{"id":"req-2","type":"response","command":"get_state","success":true,
 "data": { "model": {...}, "thinkingLevel":"medium", "isStreaming":false, ... }}

// failure
{"type":"response","command":"set_model","success":false,
 "error":"Model not found: invalid/model"}

A minimal Python client

import subprocess, json

proc = subprocess.Popen(
    ["pi", "--mode", "rpc", "--no-session"],
    stdin=subprocess.PIPE, stdout=subprocess.PIPE, text=True,
)

def send(cmd):
    proc.stdin.write(json.dumps(cmd) + "\n"); proc.stdin.flush()

def events():
    for line in proc.stdout:
        yield json.loads(line)

send({"type":"prompt","message":"Hello!"})
for evt in events():
    if evt.get("type") == "message_update":
        d = evt.get("assistantMessageEvent", {})
        if d.get("type") == "text_delta":
            print(d["delta"], end="", flush=True)
    if evt.get("type") == "agent_end":
        print(); break

The extension UI sub-protocol

Extensions can request user interaction (confirm dialogs, selects, free-form input, multi-line editor). In RPC mode these become a request/response sub-protocol on top of the base flow. Requests have type: "extension_ui_request" with a unique id and a method; the client replies with extension_ui_response echoing the same id. Dialog methods (select, confirm, input, editor) block until the client responds. Fire-and-forget methods (notify, setStatus, setWidget, setTitle, set_editor_text) do not expect a response.

JSON event stream mode

pi --mode json "your prompt" writes the session header plus every AgentSessionEvent to stdout as JSONL, then exits. Same event types as the SDK. Useful for one-shot prompts in shell scripts.

$ pi --mode json "List files" 2>/dev/null | jq -c 'select(.type == "message_end")'

Primary sources

SDK · RPC mode · JSON event stream · examples/sdk

Chapter 14·Deep dive

Pi-to-Pi protocol: full reference implementation #

Chapter 08 explained why peer-to-peer communication matters. This chapter is the protocol. Four tools, two delivery modes, two implementations. Everything below is type-complete TypeScript you can lift into a Pi extension and adapt. None of it requires changes to Pi's core.

The protocol in one page

An agent on the network is identified by a name (free-text, set when the agent joins). Every peer can do four things: enumerate the pool, send a message, await a specific reply, or poll for any reply. Messages have a stable messageId; replies reference it.

// Wire types — same shape for in-process and HTTP transports
type AgentName = string;
type MessageId = string;

interface PeerMessage {
  messageId: MessageId;
  inReplyTo?: MessageId;   // present iff this is a reply
  from: AgentName;
  to: AgentName;
  text: string;
  attachments?: { mimeType: string; data: string }[];
  ts: number;
}

interface PeerInbox {
  pending: PeerMessage[];  // messages waiting to be claimed by the LLM
}

The four tools (LLM-facing)

// 1. list — enumerate other agents on the network
list_agents(): { agents: AgentName[] }

// 2. send — deliver a prompt to a peer, return the message id
send_to_agent(args: { to: AgentName; text: string }): { messageId: MessageId }

// 3. await — block until the peer responds to a specific message id
await_reply(args: { messageId: MessageId; timeoutMs?: number }): { reply: PeerMessage | null }

// 4. check — non-blocking poll: return any new inbound messages
check_inbox(): { messages: PeerMessage[] }

That is the entire public surface. Everything else is plumbing.

Implementation A: comms (single device, in-process)

All Pi instances on one machine that share a parent process can use a single in-memory broker. The talk's reference uses a per-process singleton plus a Node EventEmitter. For Pi extensions, you express the same thing as a shared module that all agents import.

// pool.ts — single shared in-process broker (singleton)
import { EventEmitter } from "node:events";
import { randomUUID } from "node:crypto";

class CommsPool {
  private agents = new Map<AgentName, EventEmitter>();
  private inboxes = new Map<AgentName, PeerMessage[]>();

  join(name: AgentName) {
    if (this.agents.has(name)) throw new Error(`Agent ${name} already joined`);
    this.agents.set(name, new EventEmitter());
    this.inboxes.set(name, []);
  }

  leave(name: AgentName) { this.agents.delete(name); this.inboxes.delete(name); }

  list(self: AgentName): AgentName[] {
    return [...this.agents.keys()].filter(n => n !== self);
  }

  send(msg: Omit<PeerMessage, "messageId" | "ts">): MessageId {
    if (!this.agents.has(msg.to)) throw new Error(`Unknown agent: ${msg.to}`);
    const full: PeerMessage = { ...msg, messageId: randomUUID(), ts: Date.now() };
    this.inboxes.get(msg.to)!.push(full);
    this.agents.get(msg.to)!.emit("message", full);
    return full.messageId;
  }

  drain(self: AgentName): PeerMessage[] {
    const inbox = this.inboxes.get(self) ?? [];
    this.inboxes.set(self, []);
    return inbox;
  }

  awaitReply(self: AgentName, toMessageId: MessageId, timeoutMs: number): Promise<PeerMessage | null> {
    return new Promise(resolve => {
      const ee = this.agents.get(self)!;
      const onMessage = (m: PeerMessage) => {
        if (m.inReplyTo === toMessageId && m.to === self) {
          // claim it out of the inbox so check_inbox doesn't double-deliver
          const ix = this.inboxes.get(self)!.findIndex(x => x.messageId === m.messageId);
          if (ix >= 0) this.inboxes.get(self)!.splice(ix, 1);
          cleanup(); resolve(m);
        }
      };
      const onTimeout = () => { cleanup(); resolve(null); };
      const t = setTimeout(onTimeout, timeoutMs);
      const cleanup = () => { clearTimeout(t); ee.off("message", onMessage); };
      ee.on("message", onMessage);
    });
  }
}

// One module-scoped pool, shared by every Pi instance running in this process
export const pool = new CommsPool();

Then the Pi extension that exposes the four tools to the LLM:

// comms-extension.ts
import type { ExtensionAPI } from "@earendil-works/pi-coding-agent";
import { Type } from "typebox";
import { pool } from "./pool";

export default function (pi: ExtensionAPI) {
  // Each Pi gets a name from a flag or env var
  pi.registerFlag("agent-name", { description: "Name on the comms pool", type: "string" });
  const self = (pi.getFlag("agent-name") as string) ?? `agent-${process.pid}`;

  pool.join(self);

  pi.on("session_shutdown", () => pool.leave(self));

  pi.registerTool({
    name: "list_agents",
    label: "List agents",
    description: "List the other agents currently joined to the comms pool.",
    parameters: Type.Object({}),
    execute: async () => ({
      content: [{ type: "text", text: JSON.stringify({ agents: pool.list(self) }) }],
      details: {},
    }),
  });

  pi.registerTool({
    name: "send_to_agent",
    label: "Send to peer",
    description: "Send a prompt to another agent. Returns a messageId you can await.",
    parameters: Type.Object({
      to: Type.String({ description: "Peer agent name" }),
      text: Type.String({ description: "Prompt or message" }),
    }),
    execute: async (_id, params) => {
      const messageId = pool.send({ from: self, to: params.to, text: params.text });
      return { content: [{ type: "text", text: JSON.stringify({ messageId }) }], details: {} };
    },
  });

  pi.registerTool({
    name: "await_reply",
    label: "Await reply",
    description: "Block until the peer replies to a specific messageId, or timeout.",
    parameters: Type.Object({
      messageId: Type.String(),
      timeoutMs: Type.Optional(Type.Integer({ minimum: 1, default: 60_000 })),
    }),
    execute: async (_id, params, signal) => {
      const reply = await Promise.race([
        pool.awaitReply(self, params.messageId, params.timeoutMs ?? 60_000),
        new Promise<null>(resolve => signal.addEventListener("abort", () => resolve(null))),
      ]);
      return { content: [{ type: "text", text: JSON.stringify({ reply }) }], details: {} };
    },
  });

  pi.registerTool({
    name: "check_inbox",
    label: "Check inbox",
    description: "Non-blocking: return any new messages addressed to this agent.",
    parameters: Type.Object({}),
    execute: async () => {
      const messages = pool.drain(self);
      return { content: [{ type: "text", text: JSON.stringify({ messages }) }], details: {} };
    },
  });

  // Inbound message: inject as a system-visible user message so the LLM sees it next turn
  pool["agents"].get(self)!.on("message", (m: PeerMessage) => {
    if (m.inReplyTo) return;  // replies are pulled via await_reply / check_inbox
    pi.sendMessage(
      { customType: "comms:inbound",
        content: `[from ${m.from}] ${m.text}`,
        display: true,
        details: { messageId: m.messageId, from: m.from } },
      { deliverAs: "steer", triggerTurn: true }
    );
  });
}

Implementation B: comms-net (across machines)

For agents on different machines, swap the in-process broker for a tiny HTTP server. The protocol stays identical; only the transport changes. Any HTTP server works; Bun happens to be the talk's choice because of cold-start speed and built-in TypeScript.

// server.ts — start once per pool host
import { serve } from "bun";

const agents = new Map<AgentName, { lastSeen: number }>();
const inboxes = new Map<AgentName, PeerMessage[]>();

function ok(body: unknown) {
  return new Response(JSON.stringify(body), { headers: { "content-type": "application/json" } });
}

serve({
  port: 8787,
  async fetch(req) {
    const url = new URL(req.url);
    const body = req.method === "POST" ? await req.json() : null;

    switch (`${req.method} ${url.pathname}`) {
      case "POST /join": {
        const { name } = body as { name: string };
        agents.set(name, { lastSeen: Date.now() });
        inboxes.set(name, inboxes.get(name) ?? []);
        return ok({ ok: true });
      }
      case "POST /leave": {
        const { name } = body as { name: string };
        agents.delete(name); inboxes.delete(name);
        return ok({ ok: true });
      }
      case "GET /agents": {
        const self = url.searchParams.get("self");
        return ok({ agents: [...agents.keys()].filter(n => n !== self) });
      }
      case "POST /send": {
        const m = body as Omit<PeerMessage, "messageId" | "ts">;
        if (!agents.has(m.to)) return ok({ error: `Unknown agent: ${m.to}` });
        const full: PeerMessage = { ...m, messageId: crypto.randomUUID(), ts: Date.now() };
        inboxes.get(m.to)!.push(full);
        return ok({ messageId: full.messageId });
      }
      case "POST /drain": {
        const { self } = body as { self: string };
        const out = inboxes.get(self) ?? [];
        inboxes.set(self, []);
        return ok({ messages: out });
      }
      case "POST /await": {
        // Long-poll: block server-side until a matching reply arrives or timeout
        const { self, messageId, timeoutMs } = body as { self: string; messageId: string; timeoutMs: number };
        const deadline = Date.now() + timeoutMs;
        while (Date.now() < deadline) {
          const inbox = inboxes.get(self) ?? [];
          const ix = inbox.findIndex(m => m.inReplyTo === messageId && m.to === self);
          if (ix >= 0) {
            const [m] = inbox.splice(ix, 1);
            return ok({ reply: m });
          }
          await Bun.sleep(100);
        }
        return ok({ reply: null });
      }
      default:
        return new Response("not found", { status: 404 });
    }
  },
});
console.log("comms-net listening on http://localhost:8787");

And the client side, drop-in for the in-process pool:

// net-pool.ts — client that the extension uses in place of CommsPool
class NetPool {
  constructor(private base: string) {}
  private async post(path: string, body: unknown) {
    const r = await fetch(`${this.base}${path}`, {
      method: "POST", headers: { "content-type": "application/json" },
      body: JSON.stringify(body),
    });
    return r.json();
  }
  async join(name: AgentName) { return this.post("/join", { name }); }
  async leave(name: AgentName) { return this.post("/leave", { name }); }
  async list(self: AgentName): Promise<AgentName[]> {
    const r = await fetch(`${this.base}/agents?self=${encodeURIComponent(self)}`);
    return (await r.json()).agents;
  }
  async send(m: Omit<PeerMessage, "messageId" | "ts">): Promise<MessageId> {
    const { messageId, error } = await this.post("/send", m);
    if (error) throw new Error(error);
    return messageId;
  }
  async drain(self: AgentName): Promise<PeerMessage[]> {
    const { messages } = await this.post("/drain", { self });
    return messages;
  }
  async awaitReply(self: AgentName, messageId: MessageId, timeoutMs: number) {
    const { reply } = await this.post("/await", { self, messageId, timeoutMs });
    return reply as PeerMessage | null;
  }
}

// The Pi extension above only needs to swap `pool = new CommsPool()` for
// `pool = new NetPool(process.env.COMMS_NET_URL ?? "http://localhost:8787")`.

Failure modes and what to do about them

Failure	Symptom	Mitigation
Peer crashed mid-conversation	`await_reply` times out, no error from the broker	Bound every await with a sane `timeoutMs`; have the agent prompt fall back to "peer unavailable, proceed without confirmation."
Network partition (comms-net)	Sends succeed locally but never reach peers; long-poll never returns	Heartbeat: agents POST `/join` every N seconds. Server evicts entries past lastSeen + 3N. List excludes evicted names.
Tight reply loops	Two agents prompt each other indefinitely; token spend climbs	End-state in the prompt ("reply `DONE` when the answer is final"). Cap turns: refuse to send if the conversation graph exceeds N exchanges.
PII leakage across peers	One peer holds sensitive data; another asks for it	Per-agent system-prompt rules. Wrap the bash tool with `tool_call` that scrubs known patterns. Treat peers as untrusted by default.
Replay / duplicate delivery	Same messageId appears twice in an inbox	Idempotency on the LLM side: include the messageId in the rendered prompt, instruct "ignore messages whose messageId you have already replied to."
Authorization	Arbitrary processes can POST to the server	Bearer token from env var on every request. TLS for cross-host. The reference is intentionally bare; production needs both.

Why these four tools and not more

Sub-agent delegation, message-queue brokers, and pipelines (agent chains) all collapse into these four primitives. Sub-agent delegation: parent agent sends then awaits; the child checks on its own loop. Message broker: one agent is the only send target; it routes by inspecting messages. Pipeline: each stage awaits the previous and sends to the next. The four tools subsume the patterns; the patterns do not subsume the tools.

The honest read on this code

The reference above will not survive production without auth, TLS, heartbeats, idempotency, and a permission boundary on bash. The talk's framing — "read and adapt, throw your agents at it" — is correct. The point of the four-tool API is that adapting only requires changing the transport. The contract the LLM sees is stable.

Cross-reference

The conceptual case for peer-to-peer (and the two demos that motivated this protocol) is in Chapter 08. The hooks this extension relies on (registerTool, sendMessage, session_shutdown) are in Chapter 11. The agent loop that calls these tools is in Chapter 10.

Chapter 15·Deep dive

Reconstruction recipe #

If Pi vanished tomorrow, how would you rebuild this stack? In what order? With what shortcuts? This chapter is a build sequence calibrated to "minimum viable harness in a weekend, production-grade in a quarter."

Build order, eight steps

One model client for one provider. Anthropic Messages or OpenAI Responses both ship streaming, tool-use, and vision. Pick one. Implement complete() against its HTTP API as an async generator that yields text_delta, toolcall_delta, and a final done event. Stop. Do not abstract over providers yet.
A tool registry with three tools: read, write, bash. Validate parameters with TypeBox or Zod. Return { content, details, isError }. Resist the urge to add edit / grep / find until the loop is running.
The agent loop from Chapter 10. Call the model, append the assistant message, run any tool calls, append the tool results, loop. ~60 lines of code. You now have a working agent.
A JSONL session store with one entry type (message) and a parentId field that is always the previous entry. Persist on every append. Don't implement the tree yet; just append linearly. You can replay sessions and resume them.
An extension bus with five hooks: session_start, before_agent_start, tool_call, tool_result, agent_end. That covers ~80% of useful extensions (permission gates, redaction, observability, injection). Load extensions from one directory; treat them as default-exported factories that receive your ExtensionAPI.
Compaction. Walk back from the leaf collecting tokens; if you exceed contextWindow - reserveTokens, summarize everything earlier than keepRecentTokens with a structured prompt; append a compaction entry; rebuild context from the summary plus the kept tail. Don't implement branch summaries yet.
The tree. Switch parentId from "previous entry" to "actual parent." Add branch(entryId) to move the leaf back. Add a branch_summary entry type for navigation. You now have undo, fork, and clone for free.
One non-interactive surface. Pick RPC or JSON event stream. The contract is "JSON in, JSON out, line-delimited." Once you ship one, the other is a small variant. Save the SDK and the full TUI for last; they are the most code per unit of capability.

What to defer

Multiple providers. One is enough until users ask for a second.
OAuth. API keys cover the first 95% of use cases.
Themes, custom renderers, TUI components. They are nice; they are not the loop.
Sub-agent delegation. Build peer-to-peer first; sub-agent is a special case (see Chapter 14).
MCP. Tools you control with READMEs and CLI flags cover the same ground; see Mario Zechner's essay on why.

What to invest in early

Session as JSONL. Pays back the day you have a crash you can't reproduce.
Hooks for tool_call and tool_result. Every safety, observability, and customization extension lives here.
A pre-flight permission gate on bash. Cheap to add, expensive to skip.
Compaction with a structured summary. Long conversations are the default. Free-form summaries collapse into mush by turn 50.

A minimum viable stack, in files

my-harness/
├── package.json
├── src/
│   ├── index.ts              # Entry point: parse args, build session, run loop
│   ├── loop.ts               # The agent loop from Chapter 10
│   ├── model/
│   │   └── anthropic.ts      # complete() against Anthropic Messages
│   ├── tools/
│   │   ├── registry.ts       # Tool definition + active set
│   │   ├── read.ts
│   │   ├── write.ts
│   │   └── bash.ts
│   ├── session/
│   │   ├── store.ts          # Append-only JSONL, parentId pointers
│   │   └── context.ts        # buildContext() with compaction
│   ├── compaction/
│   │   └── summarize.ts      # Structured-summary prompt + call
│   ├── extensions/
│   │   ├── api.ts            # ExtensionAPI surface
│   │   ├── bus.ts            # Hook dispatch
│   │   └── loader.ts         # Read ~/.my-harness/extensions/*.ts
│   └── modes/
│       ├── interactive.ts    # Optional, last
│       └── rpc.ts            # JSON in, JSON out
└── examples/
    └── extensions/
        ├── permission-gate.ts
        ├── redact-pii.ts
        └── comms.ts          # The four-tool peer-to-peer extension

~2,000 lines of TypeScript gets you a working harness. The remaining 20,000 lines that go into a polished tool like Pi are TUI components, settings management, OAuth flows, custom-provider quirks, dozens of built-in tools, theme system, package manager, RPC extension UI sub-protocol, and so on. Each of those is independent of the loop.

Three checkpoints to know you're on track

You can replay a session. Load a JSONL file, walk the entries, hand the model identical messages, get an identical-shaped (not bit-identical) response.
You can write a one-file extension that blocks rm -rf without editing the core.
Two of your harnesses can hold a conversation. Use the four-tool protocol from Chapter 14. If they can collaborate to solve a task, the loop, the session store, and the extension bus all work.

The point of the recipe

You are not rebuilding Pi. You are proving to yourself that the architecture in Chapter 10 is small enough to internalize. Once you have, the question "should we adopt or build" answers itself per situation. For most teams, the answer is "adopt and extend." The reason that answer is comfortable is that you know what you would have built.

Where to start reading the real source

For the actual Pi implementation, the most informative entry points are session-manager.ts, compaction.ts, and the examples/extensions folder. Read them in that order.

Chapter 16·Comparison

Claude Code as the floor #

Claude Code is the most polished agentic coding tool on the market. It ships with batteries included: a curated toolset, a permission system, a 4-level memory hierarchy, hooks, skills, MCP, sub-agents, sessions with auto-compaction. To honor the talk's framing, this chapter takes Claude Code seriously in its own terms before contrasting it with Pi.

Why "the floor" is not a slight

"Floor" here means baseline of what's possible, not "low quality." Claude Code is what most senior engineers should start with. The talk's argument is that Claude Code is also where most engineers stop, and the gap between low- and high-performing agentic engineers shows up when you push past what your harness ships with. The rest of the manual is about that ceiling. This chapter is about the floor it rests on.

The Claude Code architecture in one paragraph

Claude Code runs the same agent loop as Pi (see Chapter 10). The differences live in what's wired into the loop by default. Anthropic does the wiring; you customize within the boundary they expose. A continuous loop reads your message, assembles context (git status + 4 levels of CLAUDE.md + current date + tool list, all memoized), calls the Anthropic API with the active tool set, runs each tool call after a permission check, appends the result, and loops until the model emits a turn with no tool calls. Hooks can fire on lifecycle events. MCP servers can add external tools. Sub-agents can be spawned via the Task tool. Sessions are JSON transcripts in ~/.claude/, resumed by session ID, periodically compacted.

What you get out of the box

Built-in tools

A curated set, much larger than Pi's. Read (handles PDFs and notebooks too), Edit (exact string replacement with uniqueness check), Write, Glob, Grep (ripgrep-backed), LS, Bash (persistent shell session with compound-command checks and background execution), WebFetch (HTTPS-upgraded with 15-min cache, runs a secondary model to extract), WebSearch (auto-appends Sources:), Task (spawn sub-agents), TodoWrite (the structured task list you see during agent runs), NotebookEdit. MCP-provided tools appear with the mcp__ prefix.

4-level memory hierarchy via CLAUDE.md

The most distinctive Claude Code feature. Memory files load from lowest to highest priority:

Managed: /etc/claude-code/CLAUDE.md + rules/ — admin-set, can be policy-enforced
User: ~/.claude/CLAUDE.md + ~/.claude/rules/*.md — your global preferences
Project: CLAUDE.md + .claude/CLAUDE.md + .claude/rules/*.md in every ancestor directory — team-shared, committed
Local: CLAUDE.local.md — personal project overrides, gitignored

Files closer to cwd load later, so they win. @include directives pull in other files (up to 5 levels deep, circular refs detected). Rule files in .claude/rules/ support path-scoped frontmatter — a rule for src/api/** only injects when Claude touches matching files. Max file size: 40,000 chars. Loaded files are prefixed with "These instructions OVERRIDE any default behavior and you MUST follow them exactly as written."

Permissions with four modes and rule matching

Every tool call passes through checkPermissions. Result is allow / ask / deny. The active mode sets the default:

default — prompt on potentially dangerous ops; auto-approve read-only
acceptEdits — auto-approve Edit and Write, still prompt on bash
plan — read-only; all writes and bash blocked; Claude can ExitPlanMode to request approval
bypassPermissions — disable all checks (only for sandboxed/automated runs)

Allow/deny rules with wildcard matching layer on top. Bash compound commands (&&, ||, ;, |) are split and each part is checked independently — most restrictive result wins. Output redirections outside the project, cd outside the working tree, sed -i, and writes to .claude/ or .git/ get extra scrutiny regardless of mode.

Hooks: automation on lifecycle events

Configured in settings.json. Each hook binds to an event (PreToolUse, PostToolUse, Stop, SessionStart, UserPromptSubmit, PreCompact, ...) with an optional matcher. The hook is a shell command, HTTP POST, LLM prompt, or full agent invocation. Exit code controls behavior: 0 succeed, 2 block (and show stderr to Claude), other exit codes show stderr to you. This is the customization surface for most users.

Skills

Markdown files in .claude/skills/. Frontmatter has description, argument-hint, allowed-tools, when_to_use, model (per-skill model override), paths (path-activated), context: fork (run in isolated subagent), hooks (skill-scoped hooks). Invoke with /skill-name. $ARGUMENTS substitutes the text after the command. Inline shell with !`command` runs at invocation time and injects output. Bundled skills ship in the binary. Path-activated skills auto-load when Claude touches matching files.

MCP servers

Model Context Protocol — connect external services. Configure in .mcp.json (project) or ~/.claude.json (user). Three transports: stdio (local subprocess), HTTP, SSE. Add with claude mcp add <name> -- <command>. Manage with /mcp enable, /mcp disable, /mcp reconnect. Tools from a server appear as mcp__<server>__<tool> and follow the same permission system. Anthropic and the community maintain a registry at modelcontextprotocol.io.

Multi-agent via the `Task` tool

Claude can spawn a sub-agent. Each gets a fresh context window, a specialized system prompt (per subagent_type), its own tool permissions, and runs to completion before reporting back. Modes: foreground (blocks parent), background (async, notification on completion), isolation: "worktree" (own git worktree). Persistent agent memory via ~/.claude/agent-memory/<agent-type>/MEMORY.md. Results capped at 100,000 chars. Sub-agents cannot themselves spawn teammates (flat roster); fork agents cannot fork (no recursive forking).

Sessions and compaction

JSON transcripts on disk in ~/.claude/. Each conversation has a unique session ID. Resume with --resume <id> or --resume alone for a picker. On resume, memory files are re-discovered and may differ; permission mode resets to configured default. Long conversations are periodically compacted — oldest messages summarized to keep the window manageable; the raw transcript is always preserved on disk.

Settings with four scopes

Global (~/.claude/settings.json), project (.claude/settings.json, committed), local (.claude/settings.local.json, not committed), managed (platform-specific MDM path). Merge from lowest to highest; managed wins last. Settings cover model, permissions, hooks, env vars, MCP allowlist, cleanup, worktree symlinks, attribution text, language, sandbox config. Managed-only locks: allowManagedHooksOnly, allowManagedPermissionRulesOnly, strictPluginOnlyCustomization.

Slash commands and CLI flags

CLI flags configure the session at launch (--model, --permission-mode, -p for non-interactive print, --mcp-config). Slash commands control the running session (/help, /init, /compact, /model, /permissions, /memory, /skills, /mcp, /hooks, /config). Built-in commands plus skills plus plugin commands all appear in /help.

Subcommands at the shell

claude mcp (configure servers), claude mcp serve (run Claude Code itself as an MCP server — neat for embedding), claude doctor (diagnose installation), claude update.

What "floor" means concretely

Three things are out of reach inside Claude Code by design:

You cannot replace the agent loop. The loop is Anthropic's. You can intercept around it (hooks, MCP) but you cannot rewrite the steps.
You cannot switch providers. Claude Code talks to Anthropic. apiKeyHelper and forceLoginMethod let you change credentials; they do not let you point at OpenAI, Bedrock, Ollama, or your in-house gateway.
You cannot define new lifecycle hooks. The hook events are a fixed enum. If you want to fire on something the enum doesn't cover, you wait for Anthropic to add it.

These are the boundaries of the floor. For most engineers, on most days, the boundaries are invisible. For the engineers in the talk's "top 2%" framing, the boundaries are exactly where the leverage lives.

The honest read

Claude Code is the floor in the same sense that a great cookbook is the floor of cooking. You can produce excellent results indefinitely without ever leaving the cookbook. The chef who writes new recipes does so because they understand the constraints the cookbook imposes and have a reason to push past them. Most days, follow the recipe. Some days, write your own.

Primary sources

Anthropic's official Claude Code docs · the community Claude Code wiki this chapter is built on (mintlify.wiki/VineeTagarwaL-code/claude-code) · specifically: how-it-works, tools, memory-context, permissions, hooks, skills, MCP servers, multi-agent.

Chapter 17·Comparison

Pi vs Claude Code, side-by-side #

Same loop, different philosophies. Claude Code ships features; Pi ships primitives. This chapter compares the implementations for every subsystem we covered in the deep dive. The pattern repeats: Claude Code answers "what feature do you want?", Pi answers "what primitive do you need?"

One-line summary

Claude Code curates a coherent experience around an opinionated loop. Pi exposes the loop and lets you build the experience. — The thesis of every row below

1. The agent loop

Both run the universal loop from Chapter 10: assemble context, call model, run tool calls (after permission check), append results, repeat until no tool calls.

Dimension	Claude Code	Pi
Loop ownership	Anthropic's. Closed source. You intercept around it.	Yours via SDK; open source on github.com/earendil-works/pi-mono.
Per-turn budgets	Token + tool-call budgets enforced by the query engine.	Implicit; controlled by the model and your extensions.
Tool-result oversize handling	Each tool has `maxResultSizeChars`; overflow saved to temp file, preview + path returned.	Tool implementer's responsibility; `fullOutputPath` on `BashExecutionMessage` is the same idea, surfaced explicitly.
Background execution	Background bash with `run_in_background: true` + notification.	"No background bash. Use tmux for full observability."

2. Context loading and memory files

Dimension	Claude Code	Pi
What auto-loads	Git status (branch, recent commits, working tree), current date, all CLAUDE.md files in the 4-level hierarchy, the tool list. Memoized via lodash.	Minimal system prompt by design. AGENTS.md files walked up from cwd. Current date and tool list assembled. Custom system prompt via SYSTEM.md or extension.
Memory file format	CLAUDE.md. Supports `@include` directives (up to 5 levels). Path-scoped rules via frontmatter on `.claude/rules/*.md`.	AGENTS.md. Simpler scope: one file per directory in the walk.
Scope levels	4: managed, user, project, local. Files closer to cwd load later (win the cascade).	2: `~/.pi/agent/AGENTS.md` (global), AGENTS.md in cwd and ancestors (project). Plus per-project SYSTEM.md to replace or append.
Path-scoped activation	Yes: `paths: ["src/api/**"]` in rule frontmatter; only injects when Claude touches matching files.	Same idea reached through skill frontmatter and extension `context` hook.
"Override" framing	Files prefixed with strong language that overrides defaults.	No framing; you control the system prompt entirely.
Disabling	`CLAUDE_CODE_DISABLE_CLAUDE_MDS=1`, `--bare`, `claudeMdExcludes` setting.	Just don't put an AGENTS.md there.

Claude Code's design adds prescription (the hierarchy, the override prefix, the @include directive). Pi's design subtracts to the minimum needed and gives you the system-prompt knob directly. Both end in the same place; one gets you there with a recipe, one gives you the ingredients.

3. Built-in tools

Tool	Claude Code	Pi
Read file	`Read` — text + PDF + image + Jupyter	`read` — text + image; PDF + notebooks via extension/skill
Find files	`Glob`	`find`
Search content	`Grep` (ripgrep)	`grep` (ripgrep)
Edit	`Edit` (exact string replace, uniqueness enforced) + `Write`	`edit` + `write`
Shell	`Bash` persistent session, compound-command checks, background mode	`bash`; persistent shell behavior; "no background bash"
Directory listing	`LS`	`ls`
Web	`WebFetch` (HTTPS upgrade, 15-min cache, secondary model extracts), `WebSearch`	Not built-in. Available via skills/extensions; community packages exist.
Sub-agent	`Task` — spawns a sub-agent with isolated context	Not built-in. Sub-agent delegation via tmux or via an extension. Peer-to-peer via the four-tool comms protocol (Chapter 14).
Structured todos	`TodoWrite` — renders in a panel	Not built-in. "Use a TODO.md file."
Notebooks	`NotebookEdit`	Skill/extension
External tools	MCP — auto-discovered tools with `mcp__` prefix	Skills with CLI scripts; or build an MCP extension. "What if you don't need MCP?"

4. Permission model

Dimension	Claude Code	Pi
Where permissions live	First-class subsystem with built-in modes.	Extension-implemented. Reference examples: permission-gate.ts, protected-paths.ts.
Modes	4 named modes: `default`, `acceptEdits`, `plan`, `bypassPermissions` (+ `dontAsk`, experimental `auto`).	No built-in modes. Whatever your `tool_call` hook does is your policy.
Rule syntax	Allow/deny/ask lists with wildcard matching: `"Bash(git *)"`, `"mcp__server__tool"`.	Arbitrary TypeScript in a `tool_call` handler. More expressive, less declarative.
Compound-command handling	Built-in: split `&&`/`;`/`\|`, check each, most restrictive wins.	You implement this in your handler. Reference snippet in Chapter 11.
Plan mode	Built-in: read-only; `ExitPlanMode` tool to request approval.	Build via extension or install a package.
Bypass	`bypassPermissions` mode with documented warnings.	"Run in a container, or build your own confirmation flow with extensions."

5. Customization surface — hooks vs extensions vs MCP

This is the biggest philosophical split. All three are ways to inject custom behavior; they answer the same questions very differently.

Dimension	Claude Code: hooks	Claude Code: MCP	Pi: extensions
What is it	Shell command, HTTP POST, LLM prompt, or full agent — bound to a lifecycle event	External process/server exposing tools over the Model Context Protocol	TypeScript module loaded in-process via jiti
Configured as	JSON in settings.json	JSON in `.mcp.json` / `~/.claude.json`; `claude mcp add`	Default-exported factory function; auto-discovered from `~/.pi/agent/extensions/` or `.pi/extensions/`
Hook surface	Fixed enum of ~20 events (PreToolUse, PostToolUse, Stop, SessionStart, UserPromptSubmit, PreCompact, ...)	None — MCP is for tool addition, not interception	30+ typed events, all hooks listed in Chapter 11. Some can block, some can mutate.
Adds new tools?	No — hooks decorate existing tools	Yes — primary use case	Yes — `pi.registerTool()` at load or runtime
Can block tool calls?	Yes — exit code 2 on PreToolUse	No (separate permission system gates calls)	Yes — return `{ block: true, reason }` from `tool_call`
Can mutate tool inputs?	Indirect (block + tell Claude to retry differently)	No	Yes — mutate `event.input` in place
Can mutate tool results?	PostToolUse can react but not transform	No	Yes — return partial patch from `tool_result`
Can modify system prompt?	SessionStart stdout becomes context	No	Yes — `before_agent_start` returns new `systemPrompt`
Can add commands?	Via skills (separate system) or plugins	No	Yes — `pi.registerCommand()` with autocompletion
Can add keybindings?	No	No	Yes — `pi.registerShortcut()`
Process model	Subprocess per hook fire	Long-lived subprocess or HTTP/SSE	In-process, same Node runtime
Language	Any (shell)	Any (defines the wire protocol)	TypeScript
Failure isolation	Hook process can fail without crashing Claude Code	Server can fail; tools become unavailable	Bad extension can crash Pi; you own the runtime
Performance	Process spawn per fire	One process, JSON-RPC overhead per call	Function call

Translate this into the talk's framing: Claude Code's hooks let you observe and gate; MCP lets you add; Pi's extensions let you do everything in one cohesive surface. The cost of Pi's surface is that you write TypeScript and you assume responsibility for not crashing your harness. The benefit is that there is no behavior you can't add without convincing Anthropic to ship a new hook event.

6. Skills

Both tools implement skills against the Agent Skills standard with minor extensions. The frontmatter fields converge; the discovery and invocation models differ in small ways.

Dimension	Claude Code	Pi
Format	SKILL.md in `.claude/skills/<name>/`	SKILL.md in `~/.pi/agent/skills/`, `.pi/skills/`, `.agents/skills/`, etc.
Frontmatter	`description`, `argument-hint`, `allowed-tools`, `when_to_use`, `model`, `user-invocable`, `context: fork`, `paths`, `hooks`	Standard `name` + `description` + optional `license`, `compatibility`, `metadata`, `allowed-tools`, `disable-model-invocation`
Argument substitution	`$ARGUMENTS` + named args via `arguments: [name, dir]` then `$name`	Args appended to skill content as `User: <args>` on `/skill:name args`
Inline shell at invocation	Yes: !`git log -20` runs and inserts output	No special syntax; skills can describe scripts to run via tools
Path-activated	Yes via `paths`	Skills always discoverable; activation up to the model based on description
Per-skill model	Yes via `model:`	No (use the extension `before_agent_start` to switch)
Subagent fork	Yes via `context: fork`	Not built-in
Bundled skills	Yes — compiled into the binary	No; install from anthropics/skills or pi-skills
Cross-harness skill sharing	Skills are CC-specific by default but standard-compliant	Pi can load CC skill directories: add `~/.claude/skills` to the `skills` array in settings

Pi's nontrivial move: it can adopt the Claude Code skill ecosystem wholesale. The standard is the same; Pi is the more lenient implementation.

7. Multi-agent

Dimension	Claude Code	Pi
Topology	Top-down: parent spawns child via `Task`. Strict tree. Sub-agents do not see siblings. Sub-agents do not spawn teammates (flat roster); they can spawn their own children.	Flat by default: every agent is a peer. Optional orchestrator pattern by convention.
Communication	One-way: parent passes prompt, child returns one final result.	Bidirectional: four-tool protocol (`list_agents`, `send_to_agent`, `await_reply`, `check_inbox`). See Chapter 14.
Context	Fresh window (or inherit if forked). Result capped at 100,000 chars.	Each peer has its own session; messages flow between them through the comms extension.
Process	Local in-process or remote (when eligible). Background mode supported.	Multiple Pi processes (tmux, separate machines via comms-net HTTP).
Isolation	`isolation: "worktree"` gives each agent its own git worktree.	Process isolation by default; worktree via tmux + git.
Persistent memory	Per agent type: `~/.claude/agent-memory/<type>/MEMORY.md`.	Per agent (named via flag). Sessions are persistent already.
Cancellation	Background agents survive parent's Escape; cancel via tasks panel.	Per-process; `session_shutdown` hooks fire on each.

This is the deepest architectural divergence. Claude Code's multi-agent is delegation. Pi's is collaboration. Each subsumes the other in theory; in practice, the topology you start with shapes what kinds of work you'll do.

8. Sessions and compaction

Dimension	Claude Code	Pi
Storage	JSON transcripts in `~/.claude/`. Session ID assigned at start.	JSONL (one entry per line) in `~/.pi/agent/sessions/--<path>--/<ts>_<uuid>.jsonl`. Versioned (v3). See Chapter 12.
Tree structure	Linear transcript.	Tree via `id`/`parentId`. Branching is moving the leaf back; abandoned branch stays in the file.
Resume	`--resume <id>` or `--resume` for picker. Memory re-discovered (may differ from original).	`/resume` in TUI; `SessionManager.continueRecent()` in SDK.
Branching / fork	Not first-class; sessions are linear.	First-class: `/fork`, `/clone`, `/tree` navigation, in-place branch via `sm.branch(entryId)`.
Compaction trigger	Auto: oldest messages summarized when window fills. Raw transcript preserved.	Auto: when `contextTokens > contextWindow - reserveTokens` (default reserve 16,384). Or `/compact [instructions]`.
Compaction algorithm	Implementation detail; the docs commit to "preserves the raw transcript."	Documented in full: keep recent 20k tokens, summarize earlier, structured summary format (Goal / Progress / Decisions / Next / Critical Context + tagged files). See Chapter 12.
Custom compaction	PreCompact hook can inject instructions (exit 0 stdout) or block (exit 2).	Full custom compaction via `session_before_compact` extension hook: provide your own summary with custom data in `details`.
Branch summarization	N/A (no branches).	When you navigate the tree, Pi offers to summarize the abandoned branch and inject the summary into the new branch.

9. Providers and models

Dimension	Claude Code	Pi
Providers	Anthropic only. Auth via Claude Pro/Max subscription or Anthropic Console billing.	15+ built-in: Anthropic, OpenAI, Google, Azure, Bedrock, Mistral, Groq, Cerebras, xAI, HuggingFace, Kimi For Coding, MiniMax, OpenRouter, Ollama. Plus custom via `pi.registerProvider()`.
Auth	OAuth (Pro/Max) or API key (Console). `apiKeyHelper` script. `forceLoginMethod` for enterprise.	API keys via env or `auth.json`; OAuth supported for any provider via `pi.registerProvider({ oauth: {...} })`; runtime override via `setRuntimeApiKey()`.
Model switching	`--model` at launch or `/model` mid-session.	`--model`, `/model`, or `Ctrl+L`. Cycle favorites with `Ctrl+P`. Per-session `scopedModels`.
Thinking level	`alwaysThinkingEnabled` + `effortLevel` (low/medium/high).	6 levels: off, minimal, low, medium, high, xhigh. `pi.setThinkingLevel()` at runtime.
Enterprise model lockdown	`availableModels` managed setting (allowlist).	Custom `models.json` per-org; `ModelRegistry` filtering.

10. Programmatic surfaces

Surface	Claude Code	Pi
One-shot prompt	`claude -p "prompt"` (stdin / print mode)	`pi -p "prompt"` + `--mode json` for event stream
JSON event stream	Limited (transcript Ctrl+O, hook stdin)	`pi --mode json` writes session header + every `AgentSessionEvent` as JSONL
RPC subprocess	None (use `claude mcp serve` to expose CC as an MCP server instead)	`pi --mode rpc`: JSONL over stdin/stdout; full command + event surface
Embedded SDK	Not exposed as a public Node SDK; the binary is the interface.	`@earendil-works/pi-coding-agent` SDK: `createAgentSession()`, typed events, custom tools at the SDK layer.
As an MCP server	`claude mcp serve` — turn Claude Code itself into an MCP endpoint	Build via extension if you need it; not a built-in mode
Extension UI from headless	N/A — hooks are headless	Extension UI sub-protocol over RPC: dialogs, status, widgets relayed to the client

11. Settings and packaging

Dimension	Claude Code	Pi
Settings scopes	4: user (`~/.claude/settings.json`), project (`.claude/settings.json`), local (`.claude/settings.local.json`), managed (MDM/registry/plist).	2 by default: user (`~/.pi/agent/settings.json`), project (`.pi/settings.json`). Project overrides global. Enterprise lockdown via filesystem permissions.
JSON schema	Yes: `https://schemas.anthropic.com/claude-code/settings.json`	TypeBox-typed in source; no public hosted schema URL
Package format	Plugins (npm/git) carrying skills, agents, hooks, MCP. Managed settings can lock to plugin-only sources.	Pi packages: npm:/git: refs in settings carrying extensions + skills + prompts + themes. Filtered via the `packages` array.
Versioned pinning	npm semver, git refs.	Same: `npm:@foo/pkg@1.2.3`, `git:host/user/repo@v1`. Versioned specs skip `pi update`.
Try-without-installing	N/A as a first-class concept	`pi -e <source>` installs to temp for the run
Updating	`claude update`	`pi update` (Pi + packages), `--self` (Pi only), `--extensions` (packages only), per-package update

12. Distribution and openness

Dimension	Claude Code	Pi
License	Closed (Anthropic).	MIT. Source at earendil-works/pi-mono.
Vendor relationship	You depend on Anthropic.	You depend on Earendil Inc. for upstream; you can fork.
Telemetry/account	Subscription account or Console billing.	None unique to Pi; depends on provider you authenticate to.
Roadmap influence	Anthropic-driven; community can file issues.	Same plus extensions: anything you wish existed, you can implement.

The pattern, summarized

Across every row above, the structural difference repeats: Claude Code picks reasonable defaults and exposes a configuration surface; Pi exposes the primitive and lets you build the default. The talk's "floor vs ceiling" framing is literal — Claude Code is what you get without effort; Pi is what becomes possible with effort.

What this doesn't say

Nothing above implies one tool is better. They optimize for different users. Claude Code optimizes for the engineer who wants a great agent immediately. Pi optimizes for the engineer who wants a custom agent eventually. Most teams will use one or both depending on the task. The choice is the topic of the next chapter.

Source notes

Claude Code data from the community wiki: mintlify.wiki/VineeTagarwaL-code/claude-code (the source URL on every page is cited there). Pi data from pi.dev/docs/latest and the source on github.com/earendil-works/pi-mono.

Chapter 18·Comparison

Selection guide — when each fits #

"Which one should I use?" has three honest answers, not one. This chapter is the decision framework: when Claude Code is the right floor, when adopting Claude Code plus pushing on the customization surface is the move, and when owning the harness via Pi pays back the investment.

Three scenarios, three answers

Scenario A — Claude Code, out of the box

You should pick this when:

You want the best agentic coding experience available with zero setup beyond claude.
Your work is general software engineering: fixes, features, refactors, exploration.
You're happy with Anthropic models; provider lock-in is a non-issue for you.
The customization you need fits in CLAUDE.md, permission rules, and one or two hooks.
You value polish, predictable updates, and "Anthropic operates this for me."

What this looks like in practice: one CLAUDE.md in the project root, allow-rules for your common bash commands, two hooks (Prettier on PostToolUse + npm test on Stop), one or two skills for your team's repeated workflows. You'll stop here for months.

Scenario B — Claude Code, push the surface

You should pick this when:

Scenario A is most of your work but a specific class of task needs external systems Claude doesn't reach (databases, internal APIs, design tools, observability).
You want explicit safety policies enforced (block rm -rf, sandbox bash to a container, require human approval for production access).
Your team needs shared workflows beyond CLAUDE.md (multi-step deployment, structured PR reviews, code generators).
You'll occasionally write a hook that's a real program (validation, classification, integration).

What this looks like in practice: the things from Scenario A, plus 3–8 MCP servers (your DB, ticket tracker, deploy tool, ...), 5–15 skills, a few non-trivial hooks (LLM-prompt hooks or full agent hooks for verification), and a managed-settings policy if you're at an org that needs lockdown. This is the practical ceiling for most teams.

Scenario C — Pi, own the harness

You should pick this when:

You need a behavior that requires mutating tool inputs/outputs, intercepting the system prompt, or replacing compaction. That class of behavior is unreachable from Claude Code's hook enum.
You need to use providers other than Anthropic (Bedrock for compliance, OpenAI for a specific capability, Ollama for offline, your in-house gateway, mid-session model switching across all of them).
You want peer-to-peer multi-agent communication, not top-down delegation (see Chapter 14). Or you want to swap topologies as you learn.
You want the session tree (branch, fork, clone, in-place navigation) as a first-class object you can manipulate.
You're shipping a product on top of an agent loop and you need the SDK and RPC surfaces.
You believe the architectural framing from the talk: harness ownership compounds, and the cost of owning is cheaper than the cost of waiting for Anthropic to ship the feature you need.

What this looks like in practice: a small .pi/ directory with a few extensions (permission gate, redactor, the comms-net extension if you're doing peer-to-peer), a SYSTEM.md per-project, a couple of skills imported from anthropics/skills, and an in-house pi package you share via git that bundles your team's extensions and prompts. You spend more time building, and you stop being blocked.

The decision framework, in five questions

Are you locked to Anthropic? If you must use Bedrock, OpenAI, your gateway, or local models — Pi. Claude Code does not solve this.
Do you need to intercept or mutate inside the loop? Mutating tool inputs, redacting tool results before the LLM sees them, replacing compaction with your own algorithm, modifying the system prompt per-turn — Pi. Claude Code's hooks observe and gate; they do not transform.
Do you need peer-to-peer multi-agent or branching topologies? If your work model is "agents that talk to each other as equals" or "explore three approaches in branches I can switch between" — Pi. Claude Code's Task tool is strict top-down delegation with linear sessions.
Are you building a product on top? If you need an SDK in Node or RPC from another language — Pi has both as first-class. Claude Code's headless surfaces are aimed at scripts and CI.
None of the above? Claude Code. The polish you'd be giving up isn't worth the price of carrying your own harness.

What "use both" looks like

It's a common pattern. The talk's own framing is "I still use Claude Code all the time" alongside Pi. A reasonable split:

Claude Code for daily IDE-like work, exploration, one-off bug fixes, code review.
Pi for production agentic systems, custom pipelines, anything that runs unattended, anything that needs a model other than Claude, anything that requires deep customization of the loop.

The dividing line is roughly "tool you use to think and write code" vs "tool you embed in a system that runs without you." Both are agent harnesses; their target users overlap but their optimization targets don't.

Migration costs, honestly

From	To	What you lose	What you gain
Claude Code	Pi	4-level CLAUDE.md hierarchy (collapses to 2 levels), built-in `Task` tool (rebuild with comms), built-in permission modes (rebuild with extension), TodoWrite panel, attribution defaults, polish on a thousand small things.	Provider freedom, full loop control, the tree, mid-session model cycling, SDK + RPC + JSON modes, primitives over features.
Pi	Claude Code	Provider variety, the tree, peer-to-peer comms, extension-mutated inputs/outputs, SDK access.	Polish, sub-agents as a built-in, MCP ecosystem ready-made, plan mode, managed-settings lockdown for enterprise, 4-level memory hierarchy out of the box.
Both	One	Operational complexity reduction.	Less switching cost; clear ownership of the workflow.

The hardest question

Most teams pick the wrong tool not because they misjudged the tools but because they misjudged themselves. The honest version of the decision is:

"I will write extensions" — if true, Pi pays back. If you say it but you won't, you'll get worse results than just running Claude Code.
"I need the customizations" — if real customer behavior depends on them, Pi pays back. If they're "nice to have," you'll spend more time building the harness than using it.
"I want the leverage" — only true if you have the kind of work where leverage compounds (recurring patterns, multi-step pipelines, things you'll run thousands of times). If your work is bespoke one-offs, owning the harness costs more than it earns.

The honest one-line guide

Use Claude Code unless you have a specific, named reason not to. The reasons are real and the talk catalogs them; absent those reasons, the polish wins. If you have the reasons, Pi pays back faster than you think because the loop is small (Chapter 10) and the API is cheap to extend (Chapter 11).

The talk's framing, reread

"Cloud Code is the floor. It's not the ceiling. It's just the beginning of what's possible with tools like this." Read literally: Claude Code is what's available without effort. Pi is what's available with the effort of owning your harness. Most engineers should stop at the floor most days. The top 2% the talk refers to are the engineers who picked the right days to push past it.

If you only remember one thing

The question is not "which tool wins." The question is "what work am I trying to compound?" If your work compounds — recurring patterns, repeatable pipelines, factories rather than features — the harness you control returns that compounding to you. If your work doesn't compound, a great floor is enough.

Chapter 19·Appendix

Glossary #

Agentic engineering: The process of engineering with intelligence that can operate on your behalf. Distinct from prompt-tuning (configuring a single agent's behavior) and traditional software engineering (writing the logic yourself).
Agent harness: The runtime that hosts an LLM-driven loop. Owns the system prompt, tool registry, context window strategy, I/O channels, permission model, and subprocess lifecycle.
Software factory: A system of agents plus deterministic code that produces engineering output on spec, repeatably, from a single prompt. Stages typically include plan, plan-review, scout, validate, build, test, review.
ADW — AI Developer Workflow: The speaker's term for a software factory pipeline. Combines agents and code to outperform either alone.
Dark factory: Industry term for a software factory that runs without a human on the critical path. Borrowed from "lights-out" manufacturing.
ZTE — Zero Touch Engineering: The asymptote where a prompt produces a production-ready release with no human intervention. Stated as out of scope for most teams today.
Extensible software: Software architected so that change is added via new modules at well-defined extension points rather than by modifying existing modules. The Open-Closed Principle, restated for the agentic era.
AFK agent: An always-on agent that produces value while the operator is away from keyboard. The ceiling, not the entry move. Earned by first proving the token arbitrage.
Tokenomics: The three-level funnel of token spend: maximize spend (level 1), make spend useful (level 2), capture revenue from the value created (level 3). Always-on is only justified at level 3.
Token max: Spending tokens without yet tying them to outcomes. A necessary first move, a terrible place to finish.
Token arbitrage: The gap between the cost of a token and the value (in revenue or time) the token produces when routed through your system.
Token tax: Unnecessary token spend caused by missing API access. An agent that scrapes, parses, retries, or asks the human is paying a tax that the right tool surface would eliminate.
Agentic access: The set of APIs, CLIs, RPC endpoints, and webhooks an agent can programmatically reach. The scope of what the agent can do for you.
Agentic speed: The execution rate of an agent operating on digital information. Stated by the speaker as 10x to 1000x human speed, gated entirely by whether the agent has access to the relevant tool surface.
Pi (the agent): A minimal terminal coding harness from Earendil Inc. Used in the talk as the example of an extensible harness. Homepage: pi.dev.
Peer-to-peer agent communication: A flat topology where every agent can talk to every other agent as an equal. No orchestrator. Information flows bidirectionally. Contrast with sub-agent delegation, message-queue, and agent-chain topologies.
Pi-to-Pi (or "pietoie"): The speaker's name for peer-to-peer communication between Pi agents. Implemented as a four-tool extension (list, send, await, check) over either an in-process pool or a Bun HTTP server.
comms / comms-net: The two reference extensions in the "Pi vs Cloud Code" repo. comms is single-device, in-process. comms-net adds a lightweight HTTP server so agents on different machines can join the pool.
Verifier pattern: A second agent whose job is to check the work of the primary agent. Increases token spend, decreases error rate. In peer-to-peer, the verifier is a peer rather than a parent.
Focused context window: The discipline of keeping each agent's context narrow to one task. "A focused agent is a performant agent." Larger context windows do not remove the discipline; they raise the temptation to ignore it.
Context engineering: Not getting all the right things into the window. Getting just the right things. The art of choosing what to include, what to summarize, and what to leave out.
Flat information hierarchy: An organizational structure (or agent topology) where ideas can travel between any two participants without going up and back down a chain of command. Argued to outperform hierarchical structures because the best information often lives at the bottom.
Agent loop: The universal cycle every coding agent runs: build context, call model, append response, execute tool calls, append results, repeat until no more tool calls. ~60 lines of code; see Chapter 10.
Tool registry: A dictionary of named functions exposed to the model. Each tool has a JSONSchema-typed parameter set, a description shown to the model, and an executor that returns { content, details, isError }.
Context strategy: The pure function that takes the current session and produces the message list the model sees. Owns compaction, branch-summary injection, and tool-result truncation.
Hook bus (extension bus): The typed pub/sub layered over the agent loop. Extensions subscribe to lifecycle events; the loop awaits their handlers and respects their return values. The architectural lever that makes harness ownership cheap.
JSONL session: The append-only file format Pi uses for sessions. One JSON object per line, first line is the header, every subsequent line is a typed entry with id/parentId forming a tree.
Session entry: A single line in the JSONL session file. Typed: message, compaction, branch_summary, custom, custom_message, model_change, thinking_level_change, label, session_info.
Tree (in a session): The structure formed by entries' parentId pointers. Branching is moving the leaf back; the abandoned branch stays in the file but is no longer on the active path.
Compaction: Pi's mechanism for keeping a long conversation within the model's context window. Walks back collecting tokens, summarizes everything earlier than keepRecentTokens into a CompactionEntry, rebuilds context from [summary, kept...].
Branch summary: A summary of an abandoned branch, generated when the user navigates the tree to a different leaf. Travels with the new branch so context isn't lost.
Structured summary format: Pi's summarization template: Goal / Constraints / Progress (Done, In Progress, Blocked) / Key Decisions / Next Steps / Critical Context, plus <read-files> and <modified-files> tags. Keeps the model from treating the summary as a conversation to continue.
Steer vs follow-up: Two ways to queue a message while the agent is streaming. steer is delivered after the current tool call, before the next LLM call. follow-up waits until the agent has fully stopped.
RPC mode: Pi's subprocess protocol: JSON commands on stdin (one per LF-delimited line), JSON events and responses on stdout. The contract is in the RPC docs.
SDK (AgentSession): Pi's in-process API. createAgentSession() returns an AgentSession with prompt(), steer(), followUp(), subscribe(), model controls, and tree navigation.
Extension factory: The default-exported function in a Pi extension file. Receives ExtensionAPI; sync or async. Returning a Promise makes Pi wait before session_start fires.
ExtensionAPI / ExtensionContext: The two surfaces an extension sees. ExtensionAPI is methods on pi (register tools, commands, providers, shortcuts; send messages; control state). ExtensionContext is passed to every handler and exposes ctx.ui, ctx.sessionManager, ctx.signal, etc.
Skill (Pi): A capability package with a SKILL.md and freeform supporting files. Discovered on startup; only descriptions go in the system prompt. Full content loads on-demand via read or /skill:name. Follows the Agent Skills standard.
Prompt template: A reusable prompt stored as a Markdown file. Invoked with /name; expanded to the file content before sending.
Pi package: A bundle of extensions, skills, prompt templates, and/or themes shared via npm or git. Manifest in package.json under the pi key, or auto-discovered from convention directories.
Reserve tokens / keep-recent tokens: The two knobs that govern Pi's compaction. reserveTokens (default 16,384) is space saved for the model's response. keepRecentTokens (default 20,000) is the trailing window kept verbatim.
Provider / API kind: Pi separates the network endpoint (provider: Anthropic, OpenAI, Bedrock, Ollama...) from the wire format (api kind: anthropic-messages, openai-completions, openai-responses, ...). 15+ providers map onto ~5 API kinds.
Claude Code: Anthropic's terminal-based coding agent. Closed source, Anthropic-only. The "floor" in this manual's framing: best-in-class out-of-the-box experience, with a customization surface bounded by what hooks and MCP expose.
CLAUDE.md hierarchy: Claude Code's 4-level memory system: managed (/etc/claude-code/CLAUDE.md), user (~/.claude/CLAUDE.md), project (any ancestor CLAUDE.md), local (CLAUDE.local.md). Files closer to cwd load later and win the cascade.
@include directive: Claude Code's mechanism for composing CLAUDE.md from multiple files. @./path, @~/path, @/abs/path. Max 5 levels deep, circular refs detected. Ignored inside fenced code blocks.
Path-scoped rules: Claude Code's .claude/rules/*.md files with frontmatter paths:. The rule only enters context when Claude is working on a matching file. Keeps context lean.
Permission mode (Claude Code): One of default (ask on dangerous), acceptEdits (auto-approve edits, ask on bash), plan (read-only), bypassPermissions (skip checks). Set per-session or per-project via settings.
Permission rule: An allow/deny/ask entry in Claude Code settings. Format: "Bash(git *)", "mcp__server__tool". Wildcard matching. Compound bash commands split and checked independently; most restrictive result wins.
Hook (Claude Code): A shell command, HTTP POST, LLM prompt, or full agent triggered by a Claude Code lifecycle event (PreToolUse, PostToolUse, Stop, SessionStart, UserPromptSubmit, PreCompact, ...). Exit code controls behavior: 0 succeed, 2 block, other show stderr to user.
MCP (Model Context Protocol): An open standard for connecting agents to external tools and data. Servers expose tools that appear in Claude Code as mcp__<server>__<tool>. Three transports: stdio, HTTP, SSE. Pi does not ship MCP support; it can be added via extension or replaced with skills that wrap CLI tools.
Task tool / sub-agent (Claude Code): Claude Code's built-in mechanism for spawning a sub-agent with isolated context, optionally restricted tools, foreground or background, optional worktree isolation. Strict top-down: parent passes prompt, child returns one final result. Contrast with Pi's peer-to-peer model.
Worktree isolation: Claude Code option isolation: "worktree" on a sub-agent — gives the agent its own git worktree so changes don't touch your working directory until you merge. Pi achieves the same via tmux + git from an extension.
TodoWrite: Claude Code's built-in structured task list. Items have statuses (pending, in_progress, completed); renders in a persistent panel in the TUI. Pi's equivalent: write to TODO.md or build an extension.
Plan mode: Claude Code permission mode that blocks all writes and bash. Claude can read, search, and discuss, but must exit plan mode (via ExitPlanMode) to make changes. Pi: build via extension.
Managed settings: Claude Code's enterprise-control layer. Pushed via MDM (macOS), registry (Windows), or platform-specific file path. Locks: allowManagedHooksOnly, allowManagedPermissionRulesOnly, allowManagedMcpServersOnly, strictPluginOnlyCustomization. Takes precedence over user/project/local.
Plugin (Claude Code): A bundle of skills, agents, hooks, and MCP configs distributed via npm or git. The closest analog to a Pi package. Can be locked-down via managed-settings.
Floor vs ceiling: The talk's framing for the relationship between Claude Code and Pi. Claude Code is the floor (great baseline available without effort). Pi is the ceiling (what becomes possible with effort). Most engineers should pick the floor most days; the leverage lives in choosing the right days to push past it.

Chapter 20·Appendix

Primary sources #

Where to go to verify and to go deeper. Linked once here so they are easy to find when the body text references them.

The talks and the speaker

Andy "Dev Dan" Hennings (IndyDevDan), channel and writing on agentic engineering: agenticengineer.com.
Talk 1: "Top 1 Opportunity for Senior Engineers" — the five pillars overview that anchors Chapters 01-07 of this wiki.
Talk 2: "Pi to Pi Agent Communication" — the worked example of peer-to-peer harness extension, anchoring Chapter 08.
Karpathy at the Sequoia AI Ascent (the naming event for "agentic engineering"): sequoiacap.com/ai-ascent.

Pi coding agent

Homepage: pi.dev
Docs: pi.dev/docs/latest
Source: github.com/earendil-works/pi
npm: @earendil-works/pi-coding-agent
Discord: community server
Package directory: pi.dev/packages
Models reference: pi.dev/models
Author blog (Mario Zechner): launch post at mariozechner.at and the MCP essay.

Claude Code (Chapters 16-18)

Anthropic's official Claude Code documentation: docs.claude.com/en/docs/claude-code
Community Claude Code wiki (the source for this chapter's facts; each page cites its own upstream): mintlify.wiki/VineeTagarwaL-code/claude-code
Concepts — how it works: how-it-works
Concepts — tools: tools
Concepts — memory and CLAUDE.md: memory-context
Concepts — permissions: permissions
Guide — hooks: hooks
Guide — skills: skills
Guide — MCP servers: mcp-servers
Guide — multi-agent: multi-agent
Configuration — settings: settings
Reference — commands overview: commands-overview
MCP standard: modelcontextprotocol.io
Agent Skills standard (followed by both Pi and Claude Code): agentskills.io/specification
Anthropic's skill repository (consumable by both tools): github.com/anthropics/skills

Tools referenced in the case study

Cloud sandbox for agents (the canonical example in Demo 2): e2b.dev.
Persistent-VM sandbox compared in Demo 2: exe.dev.
"Pi vs Cloud Code" reference codebase with the comms and comms-net extensions: see the speaker's channel for the current GitHub link (agenticengineer.com).

Deep-dive sources (Chapters 10-15)

Extensions API reference: pi.dev/docs/latest/extensions
Session file format: pi.dev/docs/latest/session-format
Compaction & branch summarization: pi.dev/docs/latest/compaction
SDK reference: pi.dev/docs/latest/sdk
RPC mode protocol: pi.dev/docs/latest/rpc
JSON event stream mode: pi.dev/docs/latest/json
Skills (Agent Skills standard): pi.dev/docs/latest/skills · agentskills.io spec
Pi packages (sharing extensions): pi.dev/docs/latest/packages
Source: session-manager.ts, compaction.ts, branch-summarization.ts.
Example extensions: examples/extensions (50+ files). SDK examples: examples/sdk.
TypeBox (schema for tool parameters): github.com/sinclairzx81/typebox
Bun (used for the comms-net reference server): bun.sh
jiti (how Pi loads TS extensions without a build step): github.com/unjs/jiti

Foundational principles

Open-Closed Principle, Bertrand Meyer, Object-Oriented Software Construction (1988).
Unix philosophy (small composable tools): Ritchie and Thompson, CACM, 1974; McIlroy interview.
Toyota Production System (the standardization-and-instrumentation precedent for software factories): Toyota global site.
Ford moving assembly line, 1913: The Henry Ford archive.

Tokenomics adjacent reading

Bill Gurley on the LTV math trap: above the crowd, 2012.
Andrew Chen on arbitrages eroding: the law of shitty clickthroughs.

Agentic Engineering & Harness Ownership

Foreword #

A note on voice

The top opportunity for senior engineers #

Core claim

The five pillars, in one line each

Why these five and not others

Agent harness #

Definition

Why ownership is the leverage point

Two classes of custom harnesses

What the talk's UIA J Team example actually shows

What "owning" actually means

Software factory #

Core claim

Anatomy of a factory

Two names you will see in the wild

Where the ceiling is

Extensible software #

Core claim

Two surfaces where extensibility pays

What "extensible" looks like in practice

Why this is harder than it sounds

Always-on agents (AFK agents) #

Core claim

Tokenomics in three levels

The arbitrage

What "useful" actually means

What the speaker's own token usage looks like

Agentic access #

Core claim

The token tax, defined

Where to look first

How this connects back to the harness

The compound effect #

How the pillars stack

The speaker's final framing

What the talk explicitly does not say

Pi-to-Pi: agent-to-agent communication #

The thesis in one line

Four communication topologies, in order of expressiveness

The four-tool protocol

Demo 1: PII-safe production-to-dev workflow

Demo 2: feature-parity between two cloud sandboxes

Pros and cons, stated honestly

How this connects back to the five pillars

One pull-quote to take with you

The Pi coding agent #

What Pi is

Why it shows up in the talk

Surface area, in one page

What Pi explicitly does not include

Install

Harness architecture from first principles #

The agent loop, as an algorithm

The six pieces in detail

1. Model client

2. Tool registry

3. Context strategy

4. Message store (session)

5. Extension bus (hooks)

6. UI surface

Where Pi made specific choices

The Pi extension API #

The minimum extension

Discovery: where Pi looks

Async factories for setup work

The event lifecycle

The high-value hooks, with signatures

tool_call — preflight, block, or mutate

tool_result — middleware over outputs

before_agent_start — modify system prompt or inject a message

context — last-mile message mutation

input — intercept user text before processing

session_before_compact — custom compaction

ExtensionAPI methods, by purpose

Register things

Talk to the agent

Inspect or control the runtime

Custom tools: a complete example

`tool_call` — preflight, block, or mutate

`tool_result` — middleware over outputs

`before_agent_start` — modify system prompt or inject a message

`context` — last-mile message mutation

`input` — intercept user text before processing

`session_before_compact` — custom compaction

The `AgentSession` contract

Multi-agent via the `Task` tool