Agentic Engineering & Harness Ownership
A reference companion to Andy "Dev Dan" Hennings' five-pillar talk on the top opportunity for senior engineers, with the Pi coding agent as the worked example of what harness ownership looks like in practice. Concepts only. No exercises. Read it once, return to it often.
Foreword #
This module is a faithful walk-through of a single 26-minute talk. The speaker compresses two weeks of unplugged thinking into one claim: the gap between low and high performing agentic engineers is not the model and not the agent product, it is the system around the agent. Five pillars define that system. The talk argues that owning your agent harness sits at the root of all five.
Throughout the module we treat the Pi coding agent (pi.dev) as the running illustration. Pi is the harness the speaker uses every day. We cite it not as an endorsement but because it is the most concrete public artifact of "extensible by design," and looking at the artifact makes the principles legible.
Each pillar gets its own chapter. Inside each chapter: the core claim, the underlying principle, why it matters, common pitfalls, and a primary-source pointer when the chapter touches material outside the talk. A short glossary and a primary-source index sit at the back.
A note on voice
The speaker is plain-spoken and frames the talk as a "message to myself." This wiki keeps that register. When the talk overstates, we say so. When it understates, we say so. The point is to leave you with calibrated beliefs, not slogans.
The top opportunity for senior engineers #
The opportunity has been the same for over a year: agentic engineering. What changed is its size and its proximity to becoming the default.
Core claim
"By the end of 2026, agentic engineering will be the default." The talk argues the window for being early is closing because Andrej Karpathy named the field directly at the Sequoia AI Ascent, and when Karpathy names a thing the industry follows. Translation: the term has captured shared vocabulary and the practice will follow.
The talk's recurring framing: two engineers using the exact same agent with 200k tokens get massively different results. The difference is not the agent. It is the surrounding system. The rest of the module is a taxonomy of that system.
The five pillars, in one line each
- Agent harness
- The runtime the agent lives in. Owning it is leverage. Renting it is a ceiling.
- Software factory
- The system that builds the system. Build factories, not features.
- Extensible software
- Open to extension, closed to modification. Adaptability is the survival trait when models and tools move weekly.
- Always-on agents
- AFK agents that work while you sleep. Only useful after you have proven the token arbitrage.
- Agentic access
- APIs, CLIs, RPC, webhooks. Agents only command what they can programmatically reach. Anything else is a token tax.
Why these five and not others
The speaker explicitly omits "models" from the list. The reasoning: for 80 to 90 percent of daily work, models matter less than the systems around them. Models are a bleeding-edge concern. Harness, factory, extensibility, always-on, and access are compounding concerns. They keep paying after the model du jour is replaced.
The Karpathy reference is to the Sequoia AI Ascent stage, where he framed "agentic engineering" as a discipline distinct from prompt-tuning and from traditional software engineering. Treat that talk as the canonical naming event.
Agent harness #
Whoever controls the agent harness controls your results. The harness is the runtime the agent inhabits, and the runtime decides what is even possible.
Definition
An agent harness is the program that hosts an LLM-driven loop. It owns the system prompt, the tool registry, the context window strategy, the I/O channels, the permission model, and the lifecycle of every sub-process the agent spawns. Two agents using identical model weights but different harnesses are not the same product.
Why ownership is the leverage point
The speaker's argument is straightforward. The agent gives you the speed. The agent lives in the harness. Therefore the harness gates everything: which models you can swap to, which tools you can give the agent, whether you can build a sandbox, whether you can run multi-agent orchestration, whether you can add a verifier loop. If you rent the harness, every one of those gates is somebody else's decision.
Two classes of custom harnesses
The talk distinguishes between general-purpose and domain-specific harnesses. Both are valuable. The second is where most engineers leave money on the table.
- Engineering-pattern harnesses
- General-purpose customizations of the loop itself: multi-agent teams, plan-then-act chains, verifier harnesses that have one agent check another agent's work.
- Domain-specific harnesses
- One thing done extraordinarily well. A DevOps harness. A testing harness. A billing harness. Specialization is the moat.
What the talk's UIA J Team example actually shows
The speaker demos a three-tier orchestration system built on top of his harness: an orchestrator at the top, team leads in the middle, workers at the bottom, all communicating through a chat-room interface. The point is not the demo. The point is that this shape of system is impossible with a default agent product because the host you are renting does not expose the primitives required.
The strong form of the claim: if you do not own the harness, you cannot build a domain-specific agent. You can only configure within someone else's frame. For one-off tasks this is fine. For a durable advantage it is not.
What "owning" actually means
Ownership is not "you wrote it from scratch." It means you can change the system prompt, swap providers and models mid-session, add or remove tools, layer in permission gates, control compaction, intercept and rewrite messages, and ship those changes to your team without waiting on a vendor release cycle. Pi is one example of a harness designed to make this kind of ownership cheap (pi.dev, source on GitHub).
Confusing customization with ownership: keystroke bindings and a settings.json are not a harness. Building a harness in isolation: the leverage shows up when the harness is reused across many projects, not on the first one. Treating the harness as a personal toy: if it doesn't ship to your team, the moat is one person wide.
See Chapter 08 (case study) for a worked example of what harness ownership actually unlocks: a four-tool extension that turns Pi into a peer-to-peer agent network, demonstrated on a PII-safe production-to-dev workflow and a feature-parity build between two cloud sandboxes.
Software factory #
Build factories, not features. The unit of engineering work shifts from "the next feature" to "the system of agents and code that produces features on spec, every time."
Core claim
You move your focus into the system that builds the system. The output per unit of time goes parabolic because one prompt invokes a factory that plans, scouts, validates, builds, tests, and reviews on your behalf.
Anatomy of a factory
The talk sketches a pipeline rather than a single step. Each stage is a teachable, templatable workflow.
- Plan / spec. The plan prompt is the formula for how engineering work is described. It is the first place the factory shows up.
- Plan review. A second pass over the plan, often by a different agent, before any code is written.
- Scouting. Locating the right files, modules, and dependencies the change will touch.
- Validation. Constraint checks against the spec before execution.
- Build. Actually producing the change.
- Test. The factory always runs tests. No exceptions.
- Review. A reviewing agent, or a staging environment, or a regression-fixing team of agents.
Two names you will see in the wild
- ADW — AI Developer Workflow
- The speaker's preferred term in his "Tactical Agentic Coding" course. An ADW combines agents plus deterministic code to outperform either alone.
- Dark factory
- The industry term for the same idea: an engineering pipeline that runs without human-in-the-loop on the critical path. Borrowed from manufacturing's "lights-out" factory.
"You are not the engineer that builds the feature. You are the engineer that builds the system of AI plus code that operates on your behalf." This is hard. The talk concedes it. Most engineers' identity is welded to shipping features. Untangling that takes deliberate practice.
Where the ceiling is
The speaker introduces ZTE — Zero Touch Engineering as the asymptote: prompt directly to production. He flags it as super advanced and out of scope. The honest framing: you do not need ZTE to win. You need a factory that takes you from prompt to "near production" reliably. ZTE is the limit point, not the entry bar.
"Parabolic output per unit of time" is a marketing line. What is defensible is: a working factory makes a class of repeatable work much cheaper and more consistent, and it frees the human for work that does not repeat. The leverage is real. The growth curve depends on how much of your work is repeatable.
The factory metaphor is borrowed deliberately. For background on how repeatability and tolerances drove industrial output, see Henry Ford's moving assembly line in 1913 and Taiichi Ohno's Toyota Production System. The agentic translation is identical: standardize the process, instrument every stage, fix defects at the station they appear.
Extensible software #
When models change weekly and tools change daily, brittle software is a liability. Pluggability, composability, and "open to extension, closed to modification" are survival traits.
Core claim
The pace of change is the dominant variable. Models release. Tools release. Prompts evolve. The best response is not to predict; it is to build software that absorbs change without breaking. The speaker frames this as one of two ideas he personally underweighted earlier in his agentic engineering work. The other was the harness.
Two surfaces where extensibility pays
- Engineering surface
- Your harness, your factory, your dev tooling. The win is being able to swap a model, slot in a new tool, or test a new prompt without a rewrite.
- Product surface
- The software you ship. AI involvement is incidental. The same principle applies: when the rate of change is high, code that adds is cheaper than code that modifies.
What "extensible" looks like in practice
The Pi coding agent is the talk's running example of an extensible harness: extensions are TypeScript modules with access to tools, commands, keyboard shortcuts, events, and the full TUI. Sub-agents, plan mode, permission gates, and sandboxes are not baked in. They are extensions that ship as packages and install from npm or git (Pi extensions docs). The architectural decision is to ship primitives, not features.
Pi explicitly chooses not to ship MCP, sub-agents, plan mode, permission popups, built-in to-dos, or background bash. Each can be added as an extension. The cost is that you do more configuration. The benefit is that the system survives the next pivot in agent tooling without an internal rewrite.
Why this is harder than it sounds
"Build pluggable software" is easy to say. The hidden tax is interface design: every extension point is a contract you now have to maintain. Done well, this is a deep module with a small surface and large internal complexity (the Ousterhout ideal). Done badly, it is a brittle plugin system whose every change breaks downstream.
The talk's framing: if you are generating slop and shipping tech debt, extensibility will not save you. Extensibility presumes a deliberate interface boundary. Generated code without that boundary is just more code to maintain, faster.
"Open to extension, closed to modification" was articulated by Bertrand Meyer in Object-Oriented Software Construction (1988) and popularized as the "O" in the SOLID principles. The agentic-era restatement adds: extension points must include the model, the tool registry, and the context strategy, not just the type hierarchy.
Always-on agents (AFK agents) #
Always-on is the ceiling, not the entry move. You earn the right to run agents 24/7 by first proving that the tokens you spend create value you can capture.
Core claim
Anyone can spin up an agent in a while-loop. That is "token maxing" and it is the floor. The high move is to turn on agents only after you have verified the token economics. The discipline is in not turning them on prematurely.
Tokenomics in three levels
The talk lays out a three-stage funnel. Each stage gates the next.
| Level | Behavior | State you want to leave |
|---|---|---|
| 1. Token max | Use more tokens. | Spend without measuring value. |
| 2. Useful tokens | Make those tokens valuable. | Value generated but not captured. |
| 3. Revenue capture | Convert value to revenue or measurable outcome. | This is where you turn the agent always-on. |
The arbitrage
The unit economics in the speaker's framing: buy a token for one dollar, run it through your business process, produce two dollars of value, capture the difference. Once that loop closes, scale it. This is the same logic that drives ad spend in any growth-stage company. The novelty is that the input good is compute.
What "useful" actually means
A useful token is one that contributes to an outcome someone will pay for, in cash or in time saved. The talk is blunt about the failure mode: a million crontab-driven agents are running right now and 90 percent of them are dead-useless and burning cash. The diagnostic is whether you can trace each agent run to a value-bearing artifact.
The natural impulse is to turn things on the moment they work. Resist. Validate the arbitrage on a small loop first. Always-on is a force multiplier in both directions: it multiplies your wins and your waste.
What the speaker's own token usage looks like
He claims his token growth is a "very smooth curve" because he refuses to scale anything before the value-capture step. Treat this as a calibration: high-performing agentic engineers are often not the highest-token-spend engineers. They are the ones whose tokens convert.
The arbitrage framing borrows from classic unit economics. For background, Bill Gurley's essay on LTV math is useful, and Andrew Chen's "Law of Shitty Clickthroughs" explains why arbitrages erode and need to be re-found.
Agentic access #
Agents only command what they can programmatically reach. Anything you do by hand that an agent could do via API is a tax you pay in tokens, time, and consistency.
Core claim
API access is a requirement of agentic speed. CLIs, REST endpoints, webhooks, RPC clients. If the agent cannot get there, the agent cannot help. The diagnostic question the talk insists on: "If an agent could do this and isn't, why not?"
The token tax, defined
A token tax is any work an agent does inefficiently because you have not given it direct API access. The agent burns tokens scraping, parsing, retrying, or asking the human to do the thing manually, all because the tool surface was missing. The remedy is investment in tool surfaces, not investment in better prompts.
Where to look first
- Codebases and repos: agents need git, gh, build, lint, and test as first-class tools.
- Products you operate: every admin action you can do in a UI should also be reachable via API.
- Devices and infrastructure: deploys, restarts, log queries, metric pulls.
- Internal data: search, query, and writes against your own systems of record.
The talk is explicit. You do not give production access by default. You do not give an agent permission to nuke databases, volumes, or shared infrastructure. The bash tool gets locked down. Agentic access is not the same as agentic carelessness. The point is to remove unjustified friction, not to remove justified guardrails.
How this connects back to the harness
An extensible harness is what makes selective access cheap. In Pi, for example, access is granted through extensions that wire tools, plus permission gates and protected paths that wrap them (permission-gate.ts, protected-paths.ts). The same harness that grants access also enforces the boundary. Without that, access becomes binary and unsafe.
The principle "agents only command what they can reach" rhymes with the Unix philosophy of small tools wired together. See Doug McIlroy's notes on building blocks and the original Ritchie and Thompson CACM paper. The agent is the new shell. Your tools are the new pipeline.
The compound effect #
The five pillars are not a checklist. They compose. Each one increases the leverage of the others. Owning the harness makes the factory possible. The factory makes always-on safe. Extensibility keeps both from rotting. Agentic access removes the friction that prevents either from running at agent speed.
How the pillars stack
| If you have... | You unlock... |
|---|---|
| Harness ownership | The ability to build a custom factory and to wire selective access. |
| Software factory | Repeatable, on-spec output you can trust enough to leave running. |
| Extensible software | The factory survives model and tool changes without a rewrite. |
| Always-on agents | Productive output during hours you are not at the keyboard. |
| Agentic access | Each pillar runs at agent speed, not human speed. |
The speaker's final framing
"Vibe coding is the lowest hanging fruit. Do not sit in the terminal prompting out your features. Build the software factory. Own the agent harness. Make your products extensible. Learn to arbitrage your tokens. Expose your CLIs and APIs everywhere."
If you remember nothing else: the agent is the engine, the harness is the chassis, the factory is the assembly line, extensibility is the maintainability of the line, always-on is the night shift, and agentic access is the loading dock. A car plant without any one of those is not a car plant.
What the talk explicitly does not say
- It does not name a model. By design. Models are a bleeding-edge concern. The pillars are not.
- It does not promise that any one tool is best. Pi is the example; the principles do not require Pi.
- It does not say this is easy. It says the opposite: this is a software engineering skill that takes deliberate practice.
Pi-to-Pi: agent-to-agent communication #
A worked example of what harness ownership unlocks: two (or more) Pi agents that talk to each other as peers, on the same device or across the network, with no orchestrator. Drawn from the second talk in the series, "Pi to Pi Agent Communication."
Chapter 02 made the abstract case for owning the harness. This chapter is the concrete one. The pattern below is impossible inside a rented agent product. It is trivial inside a harness you control.
The thesis in one line
"What is better than one Pi agent? Two Pi agents that actually work together." The point is not the number. The point is the topology. Most multi-agent systems today are top-down: an orchestrator delegates to workers, information flows one way. Pi-to-Pi inverts that: every agent is a peer, every channel is bidirectional, and the best information wins regardless of which agent had it.
Four communication topologies, in order of expressiveness
The talk lays these out as a progression. Each topology is a real pattern with real uses; the higher tiers do not deprecate the lower ones.
| Topology | Direction | Typical use |
|---|---|---|
| Sub-agent delegation | Parent → child (one-way, scoped) | "Do this subtask and report back." The current default. |
| Message queue / broker | Hub-and-spoke through a broker | Coordinated parallel work where one agent owns the queue. (The pattern Claude Code's "agent teams" uses.) |
| Agent chain (deterministic) | Pipeline with code between nodes | AI Developer Workflows. Adds determinism by inserting code at each handoff. |
| Peer-to-peer (bidirectional) | Any agent ↔ any agent | Flat coordination. The new ground Pi-to-Pi opens up. |
The argument leans on a familiar observation from organizational design: in any hierarchy, the best information is usually at the bottom (the people doing the work), and it dies on the way up because it lacks title or authority. Flat structures let valuable information win on its merits. The talk cites Nvidia's famously flat reporting structure and startups generally as examples. The agentic analog: in a delegation tree, the worker agent often has the best context but no channel to share it laterally. Peer-to-peer gives it one.
The four-tool protocol
There is "basically no magic" here. The whole system is four tools exposed to each agent:
- list
- Enumerate the other agents currently on the network.
- send
- Send a prompt to a named peer. Returns a message ID.
- await
- Blocking wait on a specific message ID for the peer's reply.
- check
- Non-blocking poll. Use when an agent should keep working while a peer thinks.
That is the entire surface. Two flavors of the extension ship in the speaker's public repo: comms (single-device, in-process pool) and comms-net (a lightweight Bun HTTP server that lets agents connect across machines). Both are deliberately simple. The recommendation is to read the code, then have your own agent adapt it for your security and topology requirements.
The shipping extensions are reference implementations. The talk is explicit: secure them and harden them for your specific use case. Anything that accepts network requests from other agents is a permission boundary you need to take seriously.
Demo 1: PII-safe production-to-dev workflow
The first demo is a routine engineering problem with a real constraint: a Pro-tier user is hitting a lockout bug in production; the fix requires reproducing it locally; production contains personally identifiable information that must not leak. Setup:
- Prod agent on a Mac mini hosting the production database. Knows the schema, knows what is PII, will not expose it.
- Dev agent on a laptop. Job: bring the affected slice over with PII stripped so an engineer can reproduce locally.
The dev agent sees the prod peer on the network, sends a request for the affected slice, awaits the message ID. The prod agent fetches, redacts, and replies. They iterate until the dev agent confirms a clean repro. No orchestrator. No human in the loop on the back-and-forth. The boundary is enforced by the prod agent's own instructions and by the harness, not by a separate access-control layer.
Demo 2: feature-parity between two cloud sandboxes
The second demo is a research-and-build task: produce a new Pi skill for exe.dev that matches the feature surface of an existing E2B skill. Setup:
- E2B agent: holds the canonical E2B skill, has refreshed its context on every feature, command, and quirk. Answers questions from its peer.
- exe.dev agent: drives the build. Reads the exe.dev docs, asks the E2B agent targeted questions to confirm parity, produces the new skill plus a feature-parity document that flags differences.
The result the talk highlights: ten corrections came out of the exchange — claims the E2B agent could have made wrong if it had been working alone, caught because the peer agent was actively validating them. This is the verifier pattern applied laterally between peers, not vertically from supervisor to worker.
You could put both tasks in one agent. The talk argues you should not. A focused context window is a more reliable context window. Two agents at 200k each, each focused on one tool, outperform one agent at 400k spanning both. The lesson holds independently of Pi: do not let context grow larger than the smallest task that fits inside it.
Pros and cons, stated honestly
The talk ends each pattern with a trade-off section. This is the one for peer-to-peer.
End-to-end customizable. You own the protocol because you own the harness.
Flat by construction. No information loss in the chain of command, because there is no chain.
Primitive over composition. Once you have one agent, you can compose any number. Composition is an engineering pattern; primitives are what make it cheap.
Loops are possible. Sloppy prompts produce sloppy back-and-forths and burn tokens. Define an end state.
Cost scales linearly. Agent count plus communication bounce. There is a useful upper bound; past it, more agents stop helping.
Easy to slip back into orchestration. If you find one peer doing all the directing, you have an orchestrator with extra steps. That is fine if it is what you need; just be honest about it.
How this connects back to the five pillars
- Harness ownership (Pillar 1): the entire pattern is unavailable inside a rented product. Owning Pi means you can add a four-tool extension and it works.
- Software factory (Pillar 2): peer-to-peer is a topology for the factory floor. Specialized peers replace a single overloaded worker.
- Extensible software (Pillar 3): the comms layer ships as an extension, not a core change. The same harness that runs single-agent runs the network pool.
- Always-on agents (Pillar 4): a verifier peer always listening for messages is a low-cost AFK agent that earns its tokens.
- Agentic access (Pillar 5): the network is now an API the agent reaches over. Other agents become a tool surface.
The reference extensions live in the speaker's "Pi vs Cloud Code" codebase (linked from his channel; see agenticengineer.com). Pi itself is at pi.dev. For the sandbox tools used in Demo 2: E2B and exe.dev. The "verifier pattern" referenced in passing is documented in the speaker's prior video on validator agents (linked from his channel).
One pull-quote to take with you
That is overstatement on purpose. The honest read: a harness you can extend in an afternoon expands the space of patterns you will even try. Most engineers never try peer-to-peer because their tool does not let them.
The Pi coding agent #
Pi is the talk's worked example of an extensible, ownable harness. This chapter is a structured reference, not a tutorial. For the full surface area see the Pi docs and the source on GitHub.
What Pi is
Pi is a minimal terminal coding harness built by Earendil Inc. (lead author: Mario Zechner). The tagline on pi.dev is "There are many agent harnesses, but this one is yours." The thesis: ship primitives, not features. Anything Pi does not include can be built as an extension or installed from a third-party package.
Why it shows up in the talk
Pi is the speaker's daily driver and the reason he can claim to be "building one new custom agent harness every single day." A composable harness reduces the cost of a custom harness from "fork the product" to "write an extension."
Surface area, in one page
- Modes
- Interactive TUI; print/JSON for scripts (
pi -p "query"); RPC over stdin/stdout for non-Node integrations; SDK for embedding in apps. - Providers and models
- 15+ providers, hundreds of models. Anthropic, OpenAI, Google, Azure, Bedrock, Mistral, Groq, Cerebras, xAI, Hugging Face, Kimi For Coding, MiniMax, OpenRouter, Ollama. Mid-session model switch with
/modelorCtrl+L. Custom providers via models.json. - Sessions
- Tree-structured. Navigate with
/tree. Export with/export. Share via/shareto a gist-backed URL. Example session. - Context engineering
- Minimal system prompt by design. Project instructions via AGENTS.md. Per-project override via SYSTEM.md. Customizable compaction. Skills (on-demand capability packages). Prompt templates (reusable Markdown prompts, invoked with
/name). Dynamic context injection via extensions. - Extensions
- TypeScript modules with access to tools, commands, keyboard shortcuts, events, and the full TUI. 50+ examples in the repo.
- Steering
Entersends a steering message (delivered after the current tool, interrupts the rest).Alt+Enterqueues a follow-up that waits until the agent finishes.
What Pi explicitly does not include
From the homepage, deliberately omitted features and the recommended workaround for each:
- No MCP. Build CLI tools with READMEs (Pi's "skills"), or add MCP via an extension.
- No sub-agents. Spawn Pi instances via tmux, or build via an extension.
- No permission popups. Run in a container, or build a confirmation flow via an extension.
- No plan mode. Write plans to files, or build with an extension.
- No built-in to-dos. Use a TODO.md file.
- No background bash. Use tmux for full observability and direct interaction.
The talk pronounces it "Pi" (as in π). It is sometimes written informally as "py" in transcripts because of the sound. The package is @earendil-works/pi-coding-agent on npm. The domain is pi.dev.
Install
From the homepage, four supported invocations:
curl -fsSL https://pi.dev/install.sh | sh
Or via npm, pnpm, bun, or PowerShell. See the docs for current instructions.
The design rationale lives in Mario Zechner's launch post and his "What if you don't need MCP?" essay. Community is on Discord. Package directory at pi.dev/packages. License is MIT.
Harness architecture from first principles #
Strip away the branding. An agent harness is six pieces wired together: a model client, a tool registry, a context strategy, a message store, an extension bus, and a UI surface. This chapter defines each piece in terms that do not depend on Pi. The next chapter shows how Pi instantiates them.
The agent loop, as an algorithm
Every coding agent runs the same loop. Naming and storage differ; the shape does not.
// The universal agent loop
async function agentTurn(state: AgentState): Promise<AgentState> {
while (true) {
// 1. Build the context the model will see
const messages = state.contextStrategy.build(state.session, state.systemPrompt);
// 2. Call the model with available tools
const response = await state.model.complete({
messages,
tools: state.tools.activeSchemas(),
signal: state.abortSignal,
});
// 3. Persist the assistant message
state.session.append({ role: "assistant", content: response.content, usage: response.usage });
// 4. If no tool calls, we are done
const toolCalls = response.content.filter(c => c.type === "toolCall");
if (toolCalls.length === 0) return state;
// 5. Execute each tool call (after preflight hooks)
for (const call of toolCalls) {
const blockResult = await state.hooks.fire("tool_call", call);
if (blockResult?.block) {
state.session.append({ role: "toolResult", toolCallId: call.id,
content: [{type:"text", text: blockResult.reason}], isError: true });
continue;
}
const result = await state.tools.execute(call, state.abortSignal);
const patched = await state.hooks.fire("tool_result", result) ?? result;
state.session.append({ role: "toolResult", toolCallId: call.id, ...patched });
}
// 6. Loop back: assistant will likely respond to tool results
}
}
Read it twice. Everything else in this manual is structure around this loop. The model is the engine, the loop is the crankshaft, the rest is gearing.
The six pieces in detail
1. Model client
A typed wrapper over one provider's HTTP API. It accepts a normalized message array and a tool schema, returns a stream of content blocks (text, thinking, toolCall) plus token usage. The minimum surface:
interface ModelClient {
readonly provider: string;
readonly id: string;
readonly contextWindow: number;
readonly capabilities: { reasoning: boolean; vision: boolean; toolUse: boolean };
complete(args: {
messages: NormalizedMessage[];
tools: ToolSchema[];
systemPrompt?: string;
thinkingLevel?: ThinkingLevel;
signal?: AbortSignal;
}): AsyncIterable<StreamEvent>;
}
Pi separates this into an API kind (anthropic-messages, openai-completions, openai-responses, etc.) and a provider (Anthropic, OpenAI, OpenRouter, Bedrock, Ollama, ...). Providers register models; models pick an API kind. This is why Pi supports 15+ providers without 15+ adapter rewrites: there are only ~5 wire formats.
2. Tool registry
A dictionary of named functions exposed to the model, each with a JSONSchema-typed parameter set and an executor.
interface ToolDefinition<P> {
name: string; // canonical name (lowercase, snake_case)
label: string; // human label for UI
description: string; // shown to the model
parameters: JSONSchema; // validated before execute()
execute(
toolCallId: string,
params: P,
signal: AbortSignal,
onUpdate?: (partial: ToolResult) => void, // streaming progress
ctx?: ToolContext
): Promise<ToolResult>;
}
interface ToolResult {
content: ContentBlock[]; // text or image
details?: unknown; // arbitrary metadata, not sent to LLM
isError?: boolean;
}
Two design choices matter. First, the parameter schema goes to the model verbatim — the model decides what arguments to send based on the schema's description fields. Vague schemas produce vague calls. Second, details is for the UI and for downstream extensions; the LLM only sees content.
3. Context strategy
A pure function that takes the current session and produces the message list the model will see. The naive version is "return all messages." The realistic version handles compaction, branch summaries, system-prompt assembly, and tool-result truncation.
interface ContextStrategy {
build(session: SessionStore, systemPrompt: string): NormalizedMessage[];
estimateTokens(messages: NormalizedMessage[], model: ModelClient): number;
shouldCompact(used: number, window: number, reserve: number): boolean;
}
Pi's default reserves 16,384 tokens for the response, keeps the most recent ~20,000 tokens of conversation verbatim, summarizes the rest into a CompactionEntry, and rebuilds the context from [system, summary, kept...]. See the compaction docs for the exact algorithm. Chapter 12 of this wiki walks through it.
4. Message store (session)
An append-only log of typed entries. Entries have parent pointers so the log is actually a tree — branching is a first-class operation, not a fork-the-file workaround. Pi stores it as JSONL with one entry per line; reconstruction is a single pass.
interface SessionEntry {
type: string; // "message" | "compaction" | "model_change" | ...
id: string; // 8-char hex
parentId: string | null; // null for root
timestamp: string; // ISO
}
interface SessionStore {
append(entry: Omit<SessionEntry, "id" | "parentId" | "timestamp">): string;
getLeafId(): string;
getEntry(id: string): SessionEntry | undefined;
getBranch(fromId?: string): SessionEntry[]; // root → leaf
branch(toEntryId: string): void; // move leaf back
getChildren(parentId: string): SessionEntry[];
}
The session is the source of truth for everything you can replay: model changes, tool calls, compactions, even extension state. Chapter 12 documents Pi's entry types in full.
5. Extension bus (hooks)
A typed pub/sub layered over the loop. Extensions subscribe to lifecycle events; the loop awaits handlers and respects their return values. The contract every harness eventually converges on:
type Hook =
| "session_start" | "session_shutdown"
| "before_agent_start" | "agent_start" | "agent_end"
| "turn_start" | "turn_end"
| "context" // mutate messages before send
| "before_provider_request" // mutate raw provider payload
| "after_provider_response" // inspect HTTP response
| "tool_call" // block or mutate input
| "tool_result" // mutate output
| "user_bash" // intercept ! and !! commands
| "input" // intercept user input
| "model_select" | "thinking_level_select"
| "session_before_compact" | "session_compact"
| "session_before_tree" | "session_tree"
| "session_before_fork" | "session_before_switch";
interface HookBus {
on<E extends Hook>(event: E, handler: HookHandler<E>): Disposable;
fire<E extends Hook>(event: E, payload: HookPayload<E>): Promise<HookResult<E>>;
}
This is the architectural lever that makes harness ownership cheap. Adding a new behavior is "subscribe to one hook and return a value" rather than "fork the loop."
6. UI surface
The terminal is the canonical Pi target, but the abstraction is wider: a UI surface is anything that can show messages, accept input, render tool calls, and prompt the user for confirmation. Pi exposes four UI modes — interactive TUI, print/JSON for scripts, RPC for subprocess clients, and an SDK for embedding — all served by the same loop, the same session store, and the same extensions.
A session is a tree of entries on disk. A model client streams content from a provider. A tool registry exposes typed functions to the model. A context strategy decides what slice of the session goes into each model call. An extension bus lets you intercept every step. A UI surface renders the loop to a human or a program. The agent loop wires all six together. That is the entire harness.
Where Pi made specific choices
| Piece | Pi's choice | Rationale (from the docs) |
|---|---|---|
| Model client | One api string per provider (anthropic-messages, openai-completions, ...). Custom providers via pi.registerProvider(). | Most providers map onto ~5 wire formats. Treat the wire format as the abstraction, the provider as configuration. |
| Tool registry | JSONSchema via TypeBox. Tools defined with defineTool(); extensions register at any time via pi.registerTool(). | Schemas are part of the prompt the model sees. TypeBox gives you static types and runtime validation from one definition. |
| Context strategy | Reserve 16,384 for response; keep 20,000 most recent; summarize the rest. Customizable per project, replaceable via extension. | Default that works; escape hatch that does not require forking. |
| Message store | JSONL tree with 8-char hex IDs and parentId links. Versioned (currently v3). | Append-only is robust. Trees enable in-place branching without copying files. |
| Extension bus | 30+ typed events. Handlers chain in load order. Some can block or mutate. | Cover every interesting decision point with a hook so the core never has to know about the feature you want to add. |
| UI surface | Interactive TUI, print/JSON, RPC over stdin/stdout JSONL, SDK for embedding. | Four shapes is enough to cover human terminals, shell scripts, language-agnostic clients, and same-process embedding. |
The agent loop is small. The session is small. The model client is small. The size of a useful harness comes from the hooks, because hooks are where features that other tools bake in become things you compose. This is the architectural translation of "primitives over features."
The Pi extension API #
Every behavior the talk attributes to harness ownership eventually reduces to writing one of these. This chapter is the full reference, drawn from the Pi extensions docs, with types and the events that matter most.
The minimum extension
A Pi extension is a TypeScript module with a default-exported factory. Pi loads it via jiti, so no compile step. The factory receives ExtensionAPI; that is the entire injection.
// ~/.pi/agent/extensions/hello.ts
import type { ExtensionAPI } from "@earendil-works/pi-coding-agent";
export default function (pi: ExtensionAPI) {
pi.on("session_start", async (_event, ctx) => {
ctx.ui.notify("Extension loaded!", "info");
});
}
Save the file. Pi auto-discovers it on next launch. To run with an extension without installing globally:
pi -e ./hello.ts
Discovery: where Pi looks
| Path | Scope |
|---|---|
~/.pi/agent/extensions/*.ts | Global, all projects |
~/.pi/agent/extensions/*/index.ts | Global, multi-file extensions |
.pi/extensions/*.ts | Project-local, checked into git |
settings.json → packages: ["npm:..."] | Shared via npm / git |
--extension path CLI flag | One-off without installing |
Async factories for setup work
If the factory returns a Promise, Pi awaits it before session_start fires. Use this to fetch remote configuration or discover models, so they are available immediately (including to pi --list-models).
export default async function (pi: ExtensionAPI) {
const r = await fetch("http://localhost:1234/v1/models");
const { data } = (await r.json()) as { data: Array<{id: string; context_window?: number}> };
pi.registerProvider("local-openai", {
baseUrl: "http://localhost:1234/v1",
apiKey: "LOCAL_OPENAI_API_KEY",
api: "openai-completions",
models: data.map(m => ({
id: m.id, name: m.id, reasoning: false, input: ["text"],
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
contextWindow: m.context_window ?? 128000, maxTokens: 4096,
})),
});
}
The event lifecycle
From the Pi docs, the order of events around a single user prompt:
pi starts
├─► session_start { reason: "startup" }
└─► resources_discover { reason: "startup" }
user sends prompt
├─► (extension commands checked first, bypass loop if found)
├─► input (intercept / transform / handle)
├─► (skill and template expansion if not handled)
├─► before_agent_start (inject message, modify system prompt)
├─► agent_start
│
│ ┌─── turn (repeats while LLM calls tools) ───────────┐
│ │ turn_start │
│ │ context (mutate messages) │
│ │ before_provider_request (replace payload) │
│ │ after_provider_response (inspect headers) │
│ │ message_start / message_update / message_end │
│ │ tool_execution_start │
│ │ tool_call (BLOCK or mutate) │
│ │ tool_execution_update │
│ │ tool_result (mutate output) │
│ │ tool_execution_end │
│ │ turn_end │
│ └────────────────────────────────────────────────────┘
└─► agent_end
Three properties matter. (1) Handlers run in extension load order. (2) Mutations chain — later handlers see earlier handlers' changes. (3) Some events block; some can return a replacement payload; most are notification-only. The return value's effect is part of the event's contract, not a global rule.
The high-value hooks, with signatures
tool_call — preflight, block, or mutate
Fired after tool_execution_start, before the tool runs. The handler can mutate event.input in place (later handlers and the tool itself see the mutation, no re-validation) and can return { block: true, reason } to short-circuit.
import { isToolCallEventType } from "@earendil-works/pi-coding-agent";
pi.on("tool_call", async (event, ctx) => {
if (isToolCallEventType("bash", event)) {
// Add a profile sourcing prefix to every shell command
event.input.command = `source ~/.profile\n${event.input.command}`;
if (/\brm\s+-rf\b/.test(event.input.command)) {
const ok = await ctx.ui.confirm("Dangerous!", "Allow rm -rf?");
if (!ok) return { block: true, reason: "User declined" };
}
}
});
tool_result — middleware over outputs
Fired after the tool returns and before the result message is appended to the session. Handlers chain like middleware; each sees the latest patched result.
import { isBashToolResult } from "@earendil-works/pi-coding-agent";
pi.on("tool_result", async (event, ctx) => {
if (!isBashToolResult(event)) return;
// Send to a redaction service before the LLM sees it
const r = await fetch("https://redactor.internal/scrub", {
method: "POST",
body: JSON.stringify({ content: event.content }),
signal: ctx.signal,
});
const { content } = await r.json();
return { content }; // partial patch; details / isError unchanged
});
before_agent_start — modify system prompt or inject a message
pi.on("before_agent_start", (event, ctx) => {
const current = ctx.getSystemPrompt();
return { systemPrompt: current + "\n\nNever modify files in /etc." };
});
context — last-mile message mutation
pi.on("context", (event, ctx) => {
// event.messages is the array about to go to the model
// Mutate or replace
return { messages: event.messages.filter(m => !shouldHide(m)) };
});
input — intercept user text before processing
pi.on("input", async (event, ctx) => {
if (event.text.startsWith("?quick ")) {
return { action: "transform", text: `Respond briefly: ${event.text.slice(7)}` };
}
if (event.text === "ping") {
ctx.ui.notify("pong", "info");
return { action: "handled" }; // skip agent entirely
}
return { action: "continue" };
});
session_before_compact — custom compaction
import { convertToLlm, serializeConversation } from "@earendil-works/pi-coding-agent";
pi.on("session_before_compact", async (event, ctx) => {
const { preparation, signal } = event;
const text = serializeConversation(convertToLlm(preparation.messagesToSummarize));
const summary = await myCustomModel.summarize(text, { signal });
return {
compaction: {
summary,
firstKeptEntryId: preparation.firstKeptEntryId,
tokensBefore: preparation.tokensBefore,
}
};
});
ExtensionAPI methods, by purpose
Register things
pi.registerTool(definition) // LLM-callable tool, schema via TypeBox
pi.registerCommand(name, options) // Slash command: /name
pi.registerShortcut(keys, options) // Keyboard shortcut
pi.registerFlag(name, options) // CLI flag, read via pi.getFlag(name)
pi.registerProvider(name, config) // Model provider (with OAuth optional)
pi.registerMessageRenderer(type, renderer) // Custom TUI rendering
Talk to the agent
pi.sendMessage(message, options?) // Inject custom message into session
pi.sendUserMessage(content, options?) // Send a user message (triggers turn)
pi.appendEntry(customType, data?) // Persist extension state (no LLM context)
pi.setSessionName(name) // Display name for /resume
pi.setLabel(entryId, label?) // Bookmark/marker on an entry
Inspect or control the runtime
pi.getActiveTools() / pi.getAllTools() / pi.setActiveTools(names)
pi.setModel(model) / pi.setThinkingLevel(level)
pi.getCommands()
pi.exec(command, args, options?) // Run a shell command (typed result)
pi.events.on / pi.events.emit // Shared event bus for extension ↔ extension
Custom tools: a complete example
import { Type, type Static } from "typebox";
const greetSchema = Type.Object({
name: Type.String({ description: "Name to greet" }),
enthusiasm: Type.Optional(Type.Integer({ minimum: 0, maximum: 5, default: 1 })),
});
export type GreetInput = Static<typeof greetSchema>;
pi.registerTool({
name: "greet",
label: "Greet",
description: "Greet someone by name with controllable enthusiasm",
parameters: greetSchema,
promptSnippet: "Greet a person, optionally with extra enthusiasm",
promptGuidelines: [
"Use greet when the user explicitly asks for a salutation.",
"Use greet with enthusiasm=3 or higher only when the user signals it.",
],
async execute(toolCallId, params, signal, onUpdate, ctx) {
onUpdate?.({ content: [{ type: "text", text: "Composing greeting..." }] });
const bangs = "!".repeat(params.enthusiasm ?? 1);
return {
content: [{ type: "text", text: `Hello, ${params.name}${bangs}` }],
details: { name: params.name, enthusiasm: params.enthusiasm },
};
},
});
Note: promptSnippet opts the tool into the system prompt's "Available tools" section; promptGuidelines appends bullets to the "Guidelines" section. Guidelines are merged flat across all tools, so always name the tool in the guideline text ("Use greet when...", never "Use this tool when...").
Custom commands with autocomplete
import type { AutocompleteItem } from "@earendil-works/pi-tui";
pi.registerCommand("deploy", {
description: "Deploy to an environment",
getArgumentCompletions: (prefix: string): AutocompleteItem[] | null => {
const envs = ["dev", "staging", "prod"];
const items = envs
.filter(e => e.startsWith(prefix))
.map(value => ({ value, label: value }));
return items.length ? items : null;
},
handler: async (args, ctx) => {
await ctx.waitForIdle();
ctx.ui.notify(`Deploying: ${args}`, "info");
// ctx.fork / ctx.newSession / ctx.switchSession / ctx.navigateTree available here
},
});
State persistence
Two places to put state. Use tool_result.details if the state belongs to a specific tool invocation (this gives you correct behavior across branches and forks). Use pi.appendEntry(customType, data) for opaque extension state that you want to survive restarts. Recover on session_start by walking entries and filtering on customType.
An extension is arbitrary TypeScript. Review every third-party extension you install. Use Pi's permission-gate and protected-paths examples as a baseline for sandboxing dangerous tools. The bash tool especially should be wrapped on any machine that touches production assets.
Extensions docs · 50+ example extensions · Keybindings · Themes
Sessions, compaction, and the tree #
A Pi session is a JSONL file whose entries form a tree. Everything you can undo, branch, summarize, or replay lives there. This chapter is the file format, the algorithm that builds the model's context from the tree, and the compaction strategy that keeps long conversations within the window.
File layout
Sessions live at:
~/.pi/agent/sessions/--<path>--/<timestamp>_<uuid>.jsonl
where <path> is the working directory with / replaced by -. One file per session. Append-only on disk. The first line is the header; every subsequent line is an entry with a typed payload.
Header
// Version 3 header (current)
{ "type": "session", "version": 3, "id": "uuid",
"timestamp": "2024-12-03T14:00:00.000Z", "cwd": "/path/to/project" }
// Optional: parentSession when created via /fork or /clone
{ ... , "parentSession": "/path/to/original/session.jsonl" }
Versions: v1 was linear, v2 introduced the tree, v3 renamed the hookMessage role to custom for extension unification. Older sessions auto-migrate on load.
Entry shape
interface SessionEntryBase {
type: string; // "message" | "compaction" | "branch_summary" | ...
id: string; // 8-char hex
parentId: string | null; // null for the first entry after the header
timestamp: string; // ISO 8601
}
Entry types in production
| Type | What it carries | In LLM context? |
|---|---|---|
message | A user, assistant, toolResult, bashExecution, custom, branchSummary, or compactionSummary message | Yes (depending on subtype) |
model_change | Provider + modelId at the moment the user switched | No (state only) |
thinking_level_change | New thinking level | No |
compaction | summary, firstKeptEntryId, tokensBefore, optional details | Summary is, original messages aren't |
branch_summary | Summary of an abandoned branch, with fromId back-reference | Yes, injected at navigation point |
custom | Extension state. customType identifies the extension. data is arbitrary JSON | No |
custom_message | Extension-injected message that DOES go in context | Yes |
label | User bookmark on another entry | No |
session_info | Display name for the session (latest wins) | No |
Two message types worth knowing
// Assistant content can mix text, thinking, and tool calls
interface AssistantMessage {
role: "assistant";
content: (TextContent | ThinkingContent | ToolCall)[];
api: string; provider: string; model: string;
usage: Usage; // tokens + cost
stopReason: "stop" | "length" | "toolUse" | "error" | "aborted";
timestamp: number;
}
// Bash executions from `!` commands sit in their own message type
interface BashExecutionMessage {
role: "bashExecution";
command: string; output: string;
exitCode: number | undefined;
cancelled: boolean; truncated: boolean;
fullOutputPath?: string; // when output overflowed
excludeFromContext?: boolean; // true for `!!` prefix
timestamp: number;
}
How a tree becomes a context
buildSessionContext() is the function that walks from the current leaf to the root and produces the message list the model sees. The algorithm:
function buildSessionContext(session: SessionStore, systemPrompt: string) {
const path = session.getBranch(); // [root, ..., leaf]
const out: NormalizedMessage[] = [{ role: "system", content: systemPrompt }];
// 1. If the path contains a compaction, find the most recent one
const lastCompaction = [...path].reverse().find(e => e.type === "compaction");
if (lastCompaction) {
out.push({ role: "user", content: `<summary>\n${lastCompaction.summary}\n</summary>` });
// Then include messages from firstKeptEntryId forward
const keepFrom = path.findIndex(e => e.id === lastCompaction.firstKeptEntryId);
for (const e of path.slice(keepFrom)) appendIfMessage(out, e);
} else {
for (const e of path) appendIfMessage(out, e);
}
// 2. Convert BranchSummaryEntry and CustomMessageEntry into proper messages
return out;
}
"Branching" never duplicates the file. To branch from an earlier entry, set the leaf back to that entry's id and append. The old branch still exists, just no longer on the active path. SessionManager.branch(entryId) does this; SessionManager.createBranchedSession(leafId) extracts a branch into a new file when you actually want to detach it.
Compaction in detail
Pi triggers compaction when
contextTokens > contextWindow - reserveTokens
with reserveTokens defaulting to 16,384 (configurable). You can also trigger it manually with /compact [instructions].
The algorithm:
- Walk backwards from the leaf, accumulating estimated tokens until
keepRecentTokens(default 20,000) is reached. That's the cut point. - Collect everything earlier on the active path back to the previous compaction's
firstKeptEntryId(or the start). Those are the messages to summarize. - Call the model with a structured summary prompt (Goal / Constraints / Progress / Key Decisions / Next Steps / Critical Context + tagged
<read-files>and<modified-files>). - Append a
CompactionEntrywith the summary, the kept-from id, and the pre-compaction token count. - Rebuild context from the summary plus messages after the cut. The original earlier messages remain in the JSONL file but are no longer in context.
Cut-point rules: cut only at user, assistant, bashExecution, or custom messages. Never cut at a toolResult (it must stay paired with its call). Long single turns ("split turns") are handled by summarizing the early part of the turn separately and merging the two summaries.
// CompactionEntry as it appears on disk
{
"type": "compaction",
"id": "f6g7h8i9",
"parentId": "e5f6g7h8",
"timestamp": "2024-12-03T14:10:00.000Z",
"summary": "## Goal\nUser wants to refactor auth...\n",
"firstKeptEntryId": "c3d4e5f6",
"tokensBefore": 50000,
"details": { "readFiles": [...], "modifiedFiles": [...] }
}
Branch summaries
When you navigate with /tree to a different branch, Pi offers to summarize what you are leaving behind so that context travels with you. Same summary format as compaction; the entry is branch_summary with a fromId pointing at the old leaf. File operations (read + modified) accumulate across nested branch summaries and compactions.
The structured summary format
## Goal
[What the user is trying to accomplish]
## Constraints & Preferences
- [Requirements mentioned by user]
## Progress
### Done
- [x] [Completed tasks]
### In Progress
- [ ] [Current work]
### Blocked
- [Issues, if any]
## Key Decisions
- **[Decision]**: [Rationale]
## Next Steps
1. [What should happen next]
## Critical Context
- [Data needed to continue]
<read-files>
path/to/file1.ts
</read-files>
<modified-files>
path/to/changed.ts
</modified-files>
Tool results are truncated to 2,000 characters during message serialization before summarization (long bash and read outputs would otherwise dominate the summary's token budget). The structured headings keep the model from treating the summary as a conversation to continue.
SessionManager API surface
// Construction (static)
SessionManager.create(cwd, sessionDir?)
SessionManager.open(path, sessionDir?)
SessionManager.continueRecent(cwd, sessionDir?)
SessionManager.inMemory(cwd?)
SessionManager.forkFrom(sourcePath, targetCwd, sessionDir?)
SessionManager.list(cwd, sessionDir?, onProgress?)
SessionManager.listAll(onProgress?)
// Instance: navigation
sm.getLeafId() / sm.getLeafEntry() / sm.getEntry(id)
sm.getBranch(fromId?) // path root → entry
sm.getTree() / sm.getChildren(parentId)
sm.branch(entryId) // move leaf back
sm.branchWithSummary(entryId, summary, details?, fromHook?)
sm.createBranchedSession(leafId) // extract to new file
// Instance: append (all return entry ID)
sm.appendMessage(message)
sm.appendModelChange(provider, modelId)
sm.appendThinkingLevelChange(level)
sm.appendCompaction(summary, firstKeptEntryId, tokensBefore, details?, fromHook?)
sm.appendCustomEntry(customType, data?) // state, not in context
sm.appendCustomMessageEntry(customType, content, display, details?) // in context
sm.appendLabelChange(targetId, label)
sm.appendSessionInfo(name)
// Instance: build the context the model sees
sm.buildSessionContext()
The session is the unit of replay. Anything that can be expressed as "append a typed line" is forward-compatible and recoverable from a partial write. A database adds schema migrations, locking, and a binary dump every time something changes. JSONL gives you tail -f and jq as debugging tools out of the box.
Session format · Compaction · Source: session-manager.ts, compaction.ts.
Programmatic surfaces: SDK, RPC, JSON #
Pi exposes four ways to drive the agent: the interactive TUI, the SDK (same Node process), RPC over stdin/stdout (subprocess), and a one-shot JSON event stream. Each is the same loop wearing a different jacket. This chapter is the reference for the three non-interactive ones.
Choosing a surface
| Surface | Use when | Process model |
|---|---|---|
| SDK | You're in Node/TS and want type safety, direct state access | In-process |
| RPC | Driving from another language, need process isolation | Subprocess, JSONL on stdin/stdout |
| JSON event stream | One-shot prompts piped into scripts | Subprocess, output only |
| Interactive TUI | Humans at a terminal | Same loop, terminal UI |
SDK: the canonical entry point
import {
AuthStorage, createAgentSession, ModelRegistry, SessionManager
} from "@earendil-works/pi-coding-agent";
const authStorage = AuthStorage.create();
const modelRegistry = ModelRegistry.create(authStorage);
const { session } = await createAgentSession({
sessionManager: SessionManager.inMemory(),
authStorage,
modelRegistry,
});
session.subscribe(event => {
if (event.type === "message_update"
&& event.assistantMessageEvent.type === "text_delta") {
process.stdout.write(event.assistantMessageEvent.delta);
}
});
await session.prompt("What files are in the current directory?");
The AgentSession contract
interface AgentSession {
// Send / queue prompts
prompt(text: string, options?: PromptOptions): Promise<void>;
steer(text: string): Promise<void>; // delivered after current tool
followUp(text: string): Promise<void>; // delivered when agent stops
// Observe
subscribe(listener: (event: AgentSessionEvent) => void): () => void;
readonly messages: AgentMessage[];
readonly isStreaming: boolean;
// Model state
setModel(model: Model): Promise<void>;
setThinkingLevel(level: ThinkingLevel): void;
cycleModel(): Promise<ModelCycleResult | undefined>;
// Tree navigation within the current session file
navigateTree(targetId: string, options?: {
summarize?: boolean; customInstructions?: string;
replaceInstructions?: boolean; label?: string;
}): Promise<{ editorText?: string; cancelled: boolean }>;
// Context engineering
compact(customInstructions?: string): Promise<CompactionResult>;
abortCompaction(): void;
abort(): Promise<void>;
dispose(): void;
}
The event vocabulary you'll subscribe to
type AgentSessionEvent =
// Lifecycle
| { type: "agent_start" }
| { type: "agent_end"; messages: AgentMessage[] }
| { type: "turn_start" }
| { type: "turn_end"; message: AgentMessage; toolResults: ToolResultMessage[] }
// Message lifecycle
| { type: "message_start"; message: AgentMessage }
| { type: "message_update"; message: AgentMessage; assistantMessageEvent: AssistantMessageEvent }
| { type: "message_end"; message: AgentMessage }
// Tool execution
| { type: "tool_execution_start"; toolCallId: string; toolName: string; args: unknown }
| { type: "tool_execution_update"; toolCallId: string; toolName: string; args: unknown; partialResult: unknown }
| { type: "tool_execution_end"; toolCallId: string; toolName: string; result: unknown; isError: boolean }
// Session
| { type: "queue_update"; steering: readonly string[]; followUp: readonly string[] }
| { type: "compaction_start"; reason: "manual" | "threshold" | "overflow" }
| { type: "compaction_end"; reason: ...; result: CompactionResult | undefined; aborted: boolean; willRetry: boolean }
| { type: "auto_retry_start"; attempt: number; maxAttempts: number; delayMs: number; errorMessage: string }
| { type: "auto_retry_end"; success: boolean; attempt: number; finalError?: string };
Defining tools at the SDK layer
import { Type } from "typebox";
import { defineTool, createAgentSession } from "@earendil-works/pi-coding-agent";
const statusTool = defineTool({
name: "status",
label: "Status",
description: "Get system status",
parameters: Type.Object({}),
execute: async () => ({
content: [{ type: "text", text: `Uptime: ${process.uptime()}s` }],
details: {},
}),
});
const { session } = await createAgentSession({
tools: ["read", "bash", "status"], // include built-ins + custom
customTools: [statusTool],
});
Built-in tools: read, bash, edit, write, grep, find, ls. Default set: the first four. Pass noTools: "all" to disable everything, noTools: "builtin" to keep only extension and custom tools.
RPC: JSONL over stdin/stdout
Start with pi --mode rpc. Commands go in (one JSON object per line, LF-only — Node's readline is not protocol-compliant because it also splits on U+2028/U+2029). Events come out. Each command may include an id for correlation; the corresponding response echoes the same id.
Command shapes (selection)
// Send / queue
{"type":"prompt","id":"req-1","message":"Hello"}
{"type":"prompt","message":"Stop and do this","streamingBehavior":"steer"}
{"type":"prompt","message":"After you're done","streamingBehavior":"followUp"}
{"type":"steer","message":"..."}
{"type":"follow_up","message":"..."}
{"type":"abort"}
// State
{"type":"get_state"}
{"type":"get_messages"}
{"type":"get_session_stats"}
// Model
{"type":"set_model","provider":"anthropic","modelId":"claude-sonnet-4-20250514"}
{"type":"cycle_model"}
{"type":"set_thinking_level","level":"high"}
// Compaction / retry
{"type":"compact","customInstructions":"Focus on code changes"}
{"type":"set_auto_compaction","enabled":true}
{"type":"set_auto_retry","enabled":true}
// Session tree
{"type":"new_session"}
{"type":"switch_session","sessionPath":"/path/to/session.jsonl"}
{"type":"fork","entryId":"abc123"}
{"type":"clone"}
{"type":"set_session_name","name":"refactor-auth"}
// Bash through Pi (output is added to LLM context on the NEXT prompt)
{"type":"bash","command":"ls -la"}
Response shape
// success
{"id":"req-1","type":"response","command":"prompt","success":true}
// with data
{"id":"req-2","type":"response","command":"get_state","success":true,
"data": { "model": {...}, "thinkingLevel":"medium", "isStreaming":false, ... }}
// failure
{"type":"response","command":"set_model","success":false,
"error":"Model not found: invalid/model"}
A minimal Python client
import subprocess, json
proc = subprocess.Popen(
["pi", "--mode", "rpc", "--no-session"],
stdin=subprocess.PIPE, stdout=subprocess.PIPE, text=True,
)
def send(cmd):
proc.stdin.write(json.dumps(cmd) + "\n"); proc.stdin.flush()
def events():
for line in proc.stdout:
yield json.loads(line)
send({"type":"prompt","message":"Hello!"})
for evt in events():
if evt.get("type") == "message_update":
d = evt.get("assistantMessageEvent", {})
if d.get("type") == "text_delta":
print(d["delta"], end="", flush=True)
if evt.get("type") == "agent_end":
print(); break
The extension UI sub-protocol
Extensions can request user interaction (confirm dialogs, selects, free-form input, multi-line editor). In RPC mode these become a request/response sub-protocol on top of the base flow. Requests have type: "extension_ui_request" with a unique id and a method; the client replies with extension_ui_response echoing the same id. Dialog methods (select, confirm, input, editor) block until the client responds. Fire-and-forget methods (notify, setStatus, setWidget, setTitle, set_editor_text) do not expect a response.
JSON event stream mode
pi --mode json "your prompt" writes the session header plus every AgentSessionEvent to stdout as JSONL, then exits. Same event types as the SDK. Useful for one-shot prompts in shell scripts.
$ pi --mode json "List files" 2>/dev/null | jq -c 'select(.type == "message_end")'
Pi-to-Pi protocol: full reference implementation #
Chapter 08 explained why peer-to-peer communication matters. This chapter is the protocol. Four tools, two delivery modes, two implementations. Everything below is type-complete TypeScript you can lift into a Pi extension and adapt. None of it requires changes to Pi's core.
The protocol in one page
An agent on the network is identified by a name (free-text, set when the agent joins). Every peer can do four things: enumerate the pool, send a message, await a specific reply, or poll for any reply. Messages have a stable messageId; replies reference it.
// Wire types — same shape for in-process and HTTP transports
type AgentName = string;
type MessageId = string;
interface PeerMessage {
messageId: MessageId;
inReplyTo?: MessageId; // present iff this is a reply
from: AgentName;
to: AgentName;
text: string;
attachments?: { mimeType: string; data: string }[];
ts: number;
}
interface PeerInbox {
pending: PeerMessage[]; // messages waiting to be claimed by the LLM
}
The four tools (LLM-facing)
// 1. list — enumerate other agents on the network
list_agents(): { agents: AgentName[] }
// 2. send — deliver a prompt to a peer, return the message id
send_to_agent(args: { to: AgentName; text: string }): { messageId: MessageId }
// 3. await — block until the peer responds to a specific message id
await_reply(args: { messageId: MessageId; timeoutMs?: number }): { reply: PeerMessage | null }
// 4. check — non-blocking poll: return any new inbound messages
check_inbox(): { messages: PeerMessage[] }
That is the entire public surface. Everything else is plumbing.
Implementation A: comms (single device, in-process)
All Pi instances on one machine that share a parent process can use a single in-memory broker. The talk's reference uses a per-process singleton plus a Node EventEmitter. For Pi extensions, you express the same thing as a shared module that all agents import.
// pool.ts — single shared in-process broker (singleton)
import { EventEmitter } from "node:events";
import { randomUUID } from "node:crypto";
class CommsPool {
private agents = new Map<AgentName, EventEmitter>();
private inboxes = new Map<AgentName, PeerMessage[]>();
join(name: AgentName) {
if (this.agents.has(name)) throw new Error(`Agent ${name} already joined`);
this.agents.set(name, new EventEmitter());
this.inboxes.set(name, []);
}
leave(name: AgentName) { this.agents.delete(name); this.inboxes.delete(name); }
list(self: AgentName): AgentName[] {
return [...this.agents.keys()].filter(n => n !== self);
}
send(msg: Omit<PeerMessage, "messageId" | "ts">): MessageId {
if (!this.agents.has(msg.to)) throw new Error(`Unknown agent: ${msg.to}`);
const full: PeerMessage = { ...msg, messageId: randomUUID(), ts: Date.now() };
this.inboxes.get(msg.to)!.push(full);
this.agents.get(msg.to)!.emit("message", full);
return full.messageId;
}
drain(self: AgentName): PeerMessage[] {
const inbox = this.inboxes.get(self) ?? [];
this.inboxes.set(self, []);
return inbox;
}
awaitReply(self: AgentName, toMessageId: MessageId, timeoutMs: number): Promise<PeerMessage | null> {
return new Promise(resolve => {
const ee = this.agents.get(self)!;
const onMessage = (m: PeerMessage) => {
if (m.inReplyTo === toMessageId && m.to === self) {
// claim it out of the inbox so check_inbox doesn't double-deliver
const ix = this.inboxes.get(self)!.findIndex(x => x.messageId === m.messageId);
if (ix >= 0) this.inboxes.get(self)!.splice(ix, 1);
cleanup(); resolve(m);
}
};
const onTimeout = () => { cleanup(); resolve(null); };
const t = setTimeout(onTimeout, timeoutMs);
const cleanup = () => { clearTimeout(t); ee.off("message", onMessage); };
ee.on("message", onMessage);
});
}
}
// One module-scoped pool, shared by every Pi instance running in this process
export const pool = new CommsPool();
Then the Pi extension that exposes the four tools to the LLM:
// comms-extension.ts
import type { ExtensionAPI } from "@earendil-works/pi-coding-agent";
import { Type } from "typebox";
import { pool } from "./pool";
export default function (pi: ExtensionAPI) {
// Each Pi gets a name from a flag or env var
pi.registerFlag("agent-name", { description: "Name on the comms pool", type: "string" });
const self = (pi.getFlag("agent-name") as string) ?? `agent-${process.pid}`;
pool.join(self);
pi.on("session_shutdown", () => pool.leave(self));
pi.registerTool({
name: "list_agents",
label: "List agents",
description: "List the other agents currently joined to the comms pool.",
parameters: Type.Object({}),
execute: async () => ({
content: [{ type: "text", text: JSON.stringify({ agents: pool.list(self) }) }],
details: {},
}),
});
pi.registerTool({
name: "send_to_agent",
label: "Send to peer",
description: "Send a prompt to another agent. Returns a messageId you can await.",
parameters: Type.Object({
to: Type.String({ description: "Peer agent name" }),
text: Type.String({ description: "Prompt or message" }),
}),
execute: async (_id, params) => {
const messageId = pool.send({ from: self, to: params.to, text: params.text });
return { content: [{ type: "text", text: JSON.stringify({ messageId }) }], details: {} };
},
});
pi.registerTool({
name: "await_reply",
label: "Await reply",
description: "Block until the peer replies to a specific messageId, or timeout.",
parameters: Type.Object({
messageId: Type.String(),
timeoutMs: Type.Optional(Type.Integer({ minimum: 1, default: 60_000 })),
}),
execute: async (_id, params, signal) => {
const reply = await Promise.race([
pool.awaitReply(self, params.messageId, params.timeoutMs ?? 60_000),
new Promise<null>(resolve => signal.addEventListener("abort", () => resolve(null))),
]);
return { content: [{ type: "text", text: JSON.stringify({ reply }) }], details: {} };
},
});
pi.registerTool({
name: "check_inbox",
label: "Check inbox",
description: "Non-blocking: return any new messages addressed to this agent.",
parameters: Type.Object({}),
execute: async () => {
const messages = pool.drain(self);
return { content: [{ type: "text", text: JSON.stringify({ messages }) }], details: {} };
},
});
// Inbound message: inject as a system-visible user message so the LLM sees it next turn
pool["agents"].get(self)!.on("message", (m: PeerMessage) => {
if (m.inReplyTo) return; // replies are pulled via await_reply / check_inbox
pi.sendMessage(
{ customType: "comms:inbound",
content: `[from ${m.from}] ${m.text}`,
display: true,
details: { messageId: m.messageId, from: m.from } },
{ deliverAs: "steer", triggerTurn: true }
);
});
}
Implementation B: comms-net (across machines)
For agents on different machines, swap the in-process broker for a tiny HTTP server. The protocol stays identical; only the transport changes. Any HTTP server works; Bun happens to be the talk's choice because of cold-start speed and built-in TypeScript.
// server.ts — start once per pool host
import { serve } from "bun";
const agents = new Map<AgentName, { lastSeen: number }>();
const inboxes = new Map<AgentName, PeerMessage[]>();
function ok(body: unknown) {
return new Response(JSON.stringify(body), { headers: { "content-type": "application/json" } });
}
serve({
port: 8787,
async fetch(req) {
const url = new URL(req.url);
const body = req.method === "POST" ? await req.json() : null;
switch (`${req.method} ${url.pathname}`) {
case "POST /join": {
const { name } = body as { name: string };
agents.set(name, { lastSeen: Date.now() });
inboxes.set(name, inboxes.get(name) ?? []);
return ok({ ok: true });
}
case "POST /leave": {
const { name } = body as { name: string };
agents.delete(name); inboxes.delete(name);
return ok({ ok: true });
}
case "GET /agents": {
const self = url.searchParams.get("self");
return ok({ agents: [...agents.keys()].filter(n => n !== self) });
}
case "POST /send": {
const m = body as Omit<PeerMessage, "messageId" | "ts">;
if (!agents.has(m.to)) return ok({ error: `Unknown agent: ${m.to}` });
const full: PeerMessage = { ...m, messageId: crypto.randomUUID(), ts: Date.now() };
inboxes.get(m.to)!.push(full);
return ok({ messageId: full.messageId });
}
case "POST /drain": {
const { self } = body as { self: string };
const out = inboxes.get(self) ?? [];
inboxes.set(self, []);
return ok({ messages: out });
}
case "POST /await": {
// Long-poll: block server-side until a matching reply arrives or timeout
const { self, messageId, timeoutMs } = body as { self: string; messageId: string; timeoutMs: number };
const deadline = Date.now() + timeoutMs;
while (Date.now() < deadline) {
const inbox = inboxes.get(self) ?? [];
const ix = inbox.findIndex(m => m.inReplyTo === messageId && m.to === self);
if (ix >= 0) {
const [m] = inbox.splice(ix, 1);
return ok({ reply: m });
}
await Bun.sleep(100);
}
return ok({ reply: null });
}
default:
return new Response("not found", { status: 404 });
}
},
});
console.log("comms-net listening on http://localhost:8787");
And the client side, drop-in for the in-process pool:
// net-pool.ts — client that the extension uses in place of CommsPool
class NetPool {
constructor(private base: string) {}
private async post(path: string, body: unknown) {
const r = await fetch(`${this.base}${path}`, {
method: "POST", headers: { "content-type": "application/json" },
body: JSON.stringify(body),
});
return r.json();
}
async join(name: AgentName) { return this.post("/join", { name }); }
async leave(name: AgentName) { return this.post("/leave", { name }); }
async list(self: AgentName): Promise<AgentName[]> {
const r = await fetch(`${this.base}/agents?self=${encodeURIComponent(self)}`);
return (await r.json()).agents;
}
async send(m: Omit<PeerMessage, "messageId" | "ts">): Promise<MessageId> {
const { messageId, error } = await this.post("/send", m);
if (error) throw new Error(error);
return messageId;
}
async drain(self: AgentName): Promise<PeerMessage[]> {
const { messages } = await this.post("/drain", { self });
return messages;
}
async awaitReply(self: AgentName, messageId: MessageId, timeoutMs: number) {
const { reply } = await this.post("/await", { self, messageId, timeoutMs });
return reply as PeerMessage | null;
}
}
// The Pi extension above only needs to swap `pool = new CommsPool()` for
// `pool = new NetPool(process.env.COMMS_NET_URL ?? "http://localhost:8787")`.
Failure modes and what to do about them
| Failure | Symptom | Mitigation |
|---|---|---|
| Peer crashed mid-conversation | await_reply times out, no error from the broker | Bound every await with a sane timeoutMs; have the agent prompt fall back to "peer unavailable, proceed without confirmation." |
| Network partition (comms-net) | Sends succeed locally but never reach peers; long-poll never returns | Heartbeat: agents POST /join every N seconds. Server evicts entries past lastSeen + 3N. List excludes evicted names. |
| Tight reply loops | Two agents prompt each other indefinitely; token spend climbs | End-state in the prompt ("reply DONE when the answer is final"). Cap turns: refuse to send if the conversation graph exceeds N exchanges. |
| PII leakage across peers | One peer holds sensitive data; another asks for it | Per-agent system-prompt rules. Wrap the bash tool with tool_call that scrubs known patterns. Treat peers as untrusted by default. |
| Replay / duplicate delivery | Same messageId appears twice in an inbox | Idempotency on the LLM side: include the messageId in the rendered prompt, instruct "ignore messages whose messageId you have already replied to." |
| Authorization | Arbitrary processes can POST to the server | Bearer token from env var on every request. TLS for cross-host. The reference is intentionally bare; production needs both. |
Why these four tools and not more
Sub-agent delegation, message-queue brokers, and pipelines (agent chains) all collapse into these four primitives. Sub-agent delegation: parent agent sends then awaits; the child checks on its own loop. Message broker: one agent is the only send target; it routes by inspecting messages. Pipeline: each stage awaits the previous and sends to the next. The four tools subsume the patterns; the patterns do not subsume the tools.
The reference above will not survive production without auth, TLS, heartbeats, idempotency, and a permission boundary on bash. The talk's framing — "read and adapt, throw your agents at it" — is correct. The point of the four-tool API is that adapting only requires changing the transport. The contract the LLM sees is stable.
The conceptual case for peer-to-peer (and the two demos that motivated this protocol) is in Chapter 08. The hooks this extension relies on (registerTool, sendMessage, session_shutdown) are in Chapter 11. The agent loop that calls these tools is in Chapter 10.
Reconstruction recipe #
If Pi vanished tomorrow, how would you rebuild this stack? In what order? With what shortcuts? This chapter is a build sequence calibrated to "minimum viable harness in a weekend, production-grade in a quarter."
Build order, eight steps
-
One model client for one provider. Anthropic Messages or OpenAI Responses both ship streaming, tool-use, and vision. Pick one. Implement
complete()against its HTTP API as an async generator that yieldstext_delta,toolcall_delta, and a finaldoneevent. Stop. Do not abstract over providers yet. -
A tool registry with three tools:
read,write,bash. Validate parameters with TypeBox or Zod. Return{ content, details, isError }. Resist the urge to add edit / grep / find until the loop is running. - The agent loop from Chapter 10. Call the model, append the assistant message, run any tool calls, append the tool results, loop. ~60 lines of code. You now have a working agent.
-
A JSONL session store with one entry type (
message) and aparentIdfield that is always the previous entry. Persist on every append. Don't implement the tree yet; just append linearly. You can replay sessions and resume them. -
An extension bus with five hooks:
session_start,before_agent_start,tool_call,tool_result,agent_end. That covers ~80% of useful extensions (permission gates, redaction, observability, injection). Load extensions from one directory; treat them as default-exported factories that receive yourExtensionAPI. -
Compaction. Walk back from the leaf collecting tokens; if you exceed
contextWindow - reserveTokens, summarize everything earlier thankeepRecentTokenswith a structured prompt; append acompactionentry; rebuild context from the summary plus the kept tail. Don't implement branch summaries yet. -
The tree. Switch
parentIdfrom "previous entry" to "actual parent." Addbranch(entryId)to move the leaf back. Add abranch_summaryentry type for navigation. You now have undo, fork, and clone for free. - One non-interactive surface. Pick RPC or JSON event stream. The contract is "JSON in, JSON out, line-delimited." Once you ship one, the other is a small variant. Save the SDK and the full TUI for last; they are the most code per unit of capability.
What to defer
- Multiple providers. One is enough until users ask for a second.
- OAuth. API keys cover the first 95% of use cases.
- Themes, custom renderers, TUI components. They are nice; they are not the loop.
- Sub-agent delegation. Build peer-to-peer first; sub-agent is a special case (see Chapter 14).
- MCP. Tools you control with READMEs and CLI flags cover the same ground; see Mario Zechner's essay on why.
What to invest in early
- Session as JSONL. Pays back the day you have a crash you can't reproduce.
- Hooks for
tool_callandtool_result. Every safety, observability, and customization extension lives here. - A pre-flight permission gate on
bash. Cheap to add, expensive to skip. - Compaction with a structured summary. Long conversations are the default. Free-form summaries collapse into mush by turn 50.
A minimum viable stack, in files
my-harness/
├── package.json
├── src/
│ ├── index.ts # Entry point: parse args, build session, run loop
│ ├── loop.ts # The agent loop from Chapter 10
│ ├── model/
│ │ └── anthropic.ts # complete() against Anthropic Messages
│ ├── tools/
│ │ ├── registry.ts # Tool definition + active set
│ │ ├── read.ts
│ │ ├── write.ts
│ │ └── bash.ts
│ ├── session/
│ │ ├── store.ts # Append-only JSONL, parentId pointers
│ │ └── context.ts # buildContext() with compaction
│ ├── compaction/
│ │ └── summarize.ts # Structured-summary prompt + call
│ ├── extensions/
│ │ ├── api.ts # ExtensionAPI surface
│ │ ├── bus.ts # Hook dispatch
│ │ └── loader.ts # Read ~/.my-harness/extensions/*.ts
│ └── modes/
│ ├── interactive.ts # Optional, last
│ └── rpc.ts # JSON in, JSON out
└── examples/
└── extensions/
├── permission-gate.ts
├── redact-pii.ts
└── comms.ts # The four-tool peer-to-peer extension
~2,000 lines of TypeScript gets you a working harness. The remaining 20,000 lines that go into a polished tool like Pi are TUI components, settings management, OAuth flows, custom-provider quirks, dozens of built-in tools, theme system, package manager, RPC extension UI sub-protocol, and so on. Each of those is independent of the loop.
Three checkpoints to know you're on track
- You can replay a session. Load a JSONL file, walk the entries, hand the model identical messages, get an identical-shaped (not bit-identical) response.
- You can write a one-file extension that blocks
rm -rfwithout editing the core. - Two of your harnesses can hold a conversation. Use the four-tool protocol from Chapter 14. If they can collaborate to solve a task, the loop, the session store, and the extension bus all work.
You are not rebuilding Pi. You are proving to yourself that the architecture in Chapter 10 is small enough to internalize. Once you have, the question "should we adopt or build" answers itself per situation. For most teams, the answer is "adopt and extend." The reason that answer is comfortable is that you know what you would have built.
For the actual Pi implementation, the most informative entry points are session-manager.ts, compaction.ts, and the examples/extensions folder. Read them in that order.
Claude Code as the floor #
Claude Code is the most polished agentic coding tool on the market. It ships with batteries included: a curated toolset, a permission system, a 4-level memory hierarchy, hooks, skills, MCP, sub-agents, sessions with auto-compaction. To honor the talk's framing, this chapter takes Claude Code seriously in its own terms before contrasting it with Pi.
"Floor" here means baseline of what's possible, not "low quality." Claude Code is what most senior engineers should start with. The talk's argument is that Claude Code is also where most engineers stop, and the gap between low- and high-performing agentic engineers shows up when you push past what your harness ships with. The rest of the manual is about that ceiling. This chapter is about the floor it rests on.
The Claude Code architecture in one paragraph
Claude Code runs the same agent loop as Pi (see Chapter 10). The differences live in what's wired into the loop by default. Anthropic does the wiring; you customize within the boundary they expose. A continuous loop reads your message, assembles context (git status + 4 levels of CLAUDE.md + current date + tool list, all memoized), calls the Anthropic API with the active tool set, runs each tool call after a permission check, appends the result, and loops until the model emits a turn with no tool calls. Hooks can fire on lifecycle events. MCP servers can add external tools. Sub-agents can be spawned via the Task tool. Sessions are JSON transcripts in ~/.claude/, resumed by session ID, periodically compacted.
What you get out of the box
Built-in tools
A curated set, much larger than Pi's. Read (handles PDFs and notebooks too), Edit (exact string replacement with uniqueness check), Write, Glob, Grep (ripgrep-backed), LS, Bash (persistent shell session with compound-command checks and background execution), WebFetch (HTTPS-upgraded with 15-min cache, runs a secondary model to extract), WebSearch (auto-appends Sources:), Task (spawn sub-agents), TodoWrite (the structured task list you see during agent runs), NotebookEdit. MCP-provided tools appear with the mcp__ prefix.
4-level memory hierarchy via CLAUDE.md
The most distinctive Claude Code feature. Memory files load from lowest to highest priority:
- Managed:
/etc/claude-code/CLAUDE.md+rules/— admin-set, can be policy-enforced - User:
~/.claude/CLAUDE.md+~/.claude/rules/*.md— your global preferences - Project:
CLAUDE.md+.claude/CLAUDE.md+.claude/rules/*.mdin every ancestor directory — team-shared, committed - Local:
CLAUDE.local.md— personal project overrides, gitignored
Files closer to cwd load later, so they win. @include directives pull in other files (up to 5 levels deep, circular refs detected). Rule files in .claude/rules/ support path-scoped frontmatter — a rule for src/api/** only injects when Claude touches matching files. Max file size: 40,000 chars. Loaded files are prefixed with "These instructions OVERRIDE any default behavior and you MUST follow them exactly as written."
Permissions with four modes and rule matching
Every tool call passes through checkPermissions. Result is allow / ask / deny. The active mode sets the default:
default— prompt on potentially dangerous ops; auto-approve read-onlyacceptEdits— auto-approveEditandWrite, still prompt on bashplan— read-only; all writes and bash blocked; Claude canExitPlanModeto request approvalbypassPermissions— disable all checks (only for sandboxed/automated runs)
Allow/deny rules with wildcard matching layer on top. Bash compound commands (&&, ||, ;, |) are split and each part is checked independently — most restrictive result wins. Output redirections outside the project, cd outside the working tree, sed -i, and writes to .claude/ or .git/ get extra scrutiny regardless of mode.
Hooks: automation on lifecycle events
Configured in settings.json. Each hook binds to an event (PreToolUse, PostToolUse, Stop, SessionStart, UserPromptSubmit, PreCompact, ...) with an optional matcher. The hook is a shell command, HTTP POST, LLM prompt, or full agent invocation. Exit code controls behavior: 0 succeed, 2 block (and show stderr to Claude), other exit codes show stderr to you. This is the customization surface for most users.
Skills
Markdown files in .claude/skills/. Frontmatter has description, argument-hint, allowed-tools, when_to_use, model (per-skill model override), paths (path-activated), context: fork (run in isolated subagent), hooks (skill-scoped hooks). Invoke with /skill-name. $ARGUMENTS substitutes the text after the command. Inline shell with !`command` runs at invocation time and injects output. Bundled skills ship in the binary. Path-activated skills auto-load when Claude touches matching files.
MCP servers
Model Context Protocol — connect external services. Configure in .mcp.json (project) or ~/.claude.json (user). Three transports: stdio (local subprocess), HTTP, SSE. Add with claude mcp add <name> -- <command>. Manage with /mcp enable, /mcp disable, /mcp reconnect. Tools from a server appear as mcp__<server>__<tool> and follow the same permission system. Anthropic and the community maintain a registry at modelcontextprotocol.io.
Multi-agent via the Task tool
Claude can spawn a sub-agent. Each gets a fresh context window, a specialized system prompt (per subagent_type), its own tool permissions, and runs to completion before reporting back. Modes: foreground (blocks parent), background (async, notification on completion), isolation: "worktree" (own git worktree). Persistent agent memory via ~/.claude/agent-memory/<agent-type>/MEMORY.md. Results capped at 100,000 chars. Sub-agents cannot themselves spawn teammates (flat roster); fork agents cannot fork (no recursive forking).
Sessions and compaction
JSON transcripts on disk in ~/.claude/. Each conversation has a unique session ID. Resume with --resume <id> or --resume alone for a picker. On resume, memory files are re-discovered and may differ; permission mode resets to configured default. Long conversations are periodically compacted — oldest messages summarized to keep the window manageable; the raw transcript is always preserved on disk.
Settings with four scopes
Global (~/.claude/settings.json), project (.claude/settings.json, committed), local (.claude/settings.local.json, not committed), managed (platform-specific MDM path). Merge from lowest to highest; managed wins last. Settings cover model, permissions, hooks, env vars, MCP allowlist, cleanup, worktree symlinks, attribution text, language, sandbox config. Managed-only locks: allowManagedHooksOnly, allowManagedPermissionRulesOnly, strictPluginOnlyCustomization.
Slash commands and CLI flags
CLI flags configure the session at launch (--model, --permission-mode, -p for non-interactive print, --mcp-config). Slash commands control the running session (/help, /init, /compact, /model, /permissions, /memory, /skills, /mcp, /hooks, /config). Built-in commands plus skills plus plugin commands all appear in /help.
Subcommands at the shell
claude mcp (configure servers), claude mcp serve (run Claude Code itself as an MCP server — neat for embedding), claude doctor (diagnose installation), claude update.
What "floor" means concretely
Three things are out of reach inside Claude Code by design:
- You cannot replace the agent loop. The loop is Anthropic's. You can intercept around it (hooks, MCP) but you cannot rewrite the steps.
- You cannot switch providers. Claude Code talks to Anthropic.
apiKeyHelperandforceLoginMethodlet you change credentials; they do not let you point at OpenAI, Bedrock, Ollama, or your in-house gateway. - You cannot define new lifecycle hooks. The hook events are a fixed enum. If you want to fire on something the enum doesn't cover, you wait for Anthropic to add it.
These are the boundaries of the floor. For most engineers, on most days, the boundaries are invisible. For the engineers in the talk's "top 2%" framing, the boundaries are exactly where the leverage lives.
Claude Code is the floor in the same sense that a great cookbook is the floor of cooking. You can produce excellent results indefinitely without ever leaving the cookbook. The chef who writes new recipes does so because they understand the constraints the cookbook imposes and have a reason to push past them. Most days, follow the recipe. Some days, write your own.
Anthropic's official Claude Code docs · the community Claude Code wiki this chapter is built on (mintlify.wiki/VineeTagarwaL-code/claude-code) · specifically: how-it-works, tools, memory-context, permissions, hooks, skills, MCP servers, multi-agent.
Pi vs Claude Code, side-by-side #
Same loop, different philosophies. Claude Code ships features; Pi ships primitives. This chapter compares the implementations for every subsystem we covered in the deep dive. The pattern repeats: Claude Code answers "what feature do you want?", Pi answers "what primitive do you need?"
One-line summary
1. The agent loop
Both run the universal loop from Chapter 10: assemble context, call model, run tool calls (after permission check), append results, repeat until no tool calls.
| Dimension | Claude Code | Pi |
|---|---|---|
| Loop ownership | Anthropic's. Closed source. You intercept around it. | Yours via SDK; open source on github.com/earendil-works/pi-mono. |
| Per-turn budgets | Token + tool-call budgets enforced by the query engine. | Implicit; controlled by the model and your extensions. |
| Tool-result oversize handling | Each tool has maxResultSizeChars; overflow saved to temp file, preview + path returned. | Tool implementer's responsibility; fullOutputPath on BashExecutionMessage is the same idea, surfaced explicitly. |
| Background execution | Background bash with run_in_background: true + notification. | "No background bash. Use tmux for full observability." |
2. Context loading and memory files
| Dimension | Claude Code | Pi |
|---|---|---|
| What auto-loads | Git status (branch, recent commits, working tree), current date, all CLAUDE.md files in the 4-level hierarchy, the tool list. Memoized via lodash. | Minimal system prompt by design. AGENTS.md files walked up from cwd. Current date and tool list assembled. Custom system prompt via SYSTEM.md or extension. |
| Memory file format | CLAUDE.md. Supports @include directives (up to 5 levels). Path-scoped rules via frontmatter on .claude/rules/*.md. | AGENTS.md. Simpler scope: one file per directory in the walk. |
| Scope levels | 4: managed, user, project, local. Files closer to cwd load later (win the cascade). | 2: ~/.pi/agent/AGENTS.md (global), AGENTS.md in cwd and ancestors (project). Plus per-project SYSTEM.md to replace or append. |
| Path-scoped activation | Yes: paths: ["src/api/**"] in rule frontmatter; only injects when Claude touches matching files. | Same idea reached through skill frontmatter and extension context hook. |
| "Override" framing | Files prefixed with strong language that overrides defaults. | No framing; you control the system prompt entirely. |
| Disabling | CLAUDE_CODE_DISABLE_CLAUDE_MDS=1, --bare, claudeMdExcludes setting. | Just don't put an AGENTS.md there. |
Claude Code's design adds prescription (the hierarchy, the override prefix, the @include directive). Pi's design subtracts to the minimum needed and gives you the system-prompt knob directly. Both end in the same place; one gets you there with a recipe, one gives you the ingredients.
3. Built-in tools
| Tool | Claude Code | Pi |
|---|---|---|
| Read file | Read — text + PDF + image + Jupyter | read — text + image; PDF + notebooks via extension/skill |
| Find files | Glob | find |
| Search content | Grep (ripgrep) | grep (ripgrep) |
| Edit | Edit (exact string replace, uniqueness enforced) + Write | edit + write |
| Shell | Bash persistent session, compound-command checks, background mode | bash; persistent shell behavior; "no background bash" |
| Directory listing | LS | ls |
| Web | WebFetch (HTTPS upgrade, 15-min cache, secondary model extracts), WebSearch | Not built-in. Available via skills/extensions; community packages exist. |
| Sub-agent | Task — spawns a sub-agent with isolated context | Not built-in. Sub-agent delegation via tmux or via an extension. Peer-to-peer via the four-tool comms protocol (Chapter 14). |
| Structured todos | TodoWrite — renders in a panel | Not built-in. "Use a TODO.md file." |
| Notebooks | NotebookEdit | Skill/extension |
| External tools | MCP — auto-discovered tools with mcp__ prefix | Skills with CLI scripts; or build an MCP extension. "What if you don't need MCP?" |
4. Permission model
| Dimension | Claude Code | Pi |
|---|---|---|
| Where permissions live | First-class subsystem with built-in modes. | Extension-implemented. Reference examples: permission-gate.ts, protected-paths.ts. |
| Modes | 4 named modes: default, acceptEdits, plan, bypassPermissions (+ dontAsk, experimental auto). | No built-in modes. Whatever your tool_call hook does is your policy. |
| Rule syntax | Allow/deny/ask lists with wildcard matching: "Bash(git *)", "mcp__server__tool". | Arbitrary TypeScript in a tool_call handler. More expressive, less declarative. |
| Compound-command handling | Built-in: split &&/;/|, check each, most restrictive wins. | You implement this in your handler. Reference snippet in Chapter 11. |
| Plan mode | Built-in: read-only; ExitPlanMode tool to request approval. | Build via extension or install a package. |
| Bypass | bypassPermissions mode with documented warnings. | "Run in a container, or build your own confirmation flow with extensions." |
5. Customization surface — hooks vs extensions vs MCP
This is the biggest philosophical split. All three are ways to inject custom behavior; they answer the same questions very differently.
| Dimension | Claude Code: hooks | Claude Code: MCP | Pi: extensions |
|---|---|---|---|
| What is it | Shell command, HTTP POST, LLM prompt, or full agent — bound to a lifecycle event | External process/server exposing tools over the Model Context Protocol | TypeScript module loaded in-process via jiti |
| Configured as | JSON in settings.json | JSON in .mcp.json / ~/.claude.json; claude mcp add | Default-exported factory function; auto-discovered from ~/.pi/agent/extensions/ or .pi/extensions/ |
| Hook surface | Fixed enum of ~20 events (PreToolUse, PostToolUse, Stop, SessionStart, UserPromptSubmit, PreCompact, ...) | None — MCP is for tool addition, not interception | 30+ typed events, all hooks listed in Chapter 11. Some can block, some can mutate. |
| Adds new tools? | No — hooks decorate existing tools | Yes — primary use case | Yes — pi.registerTool() at load or runtime |
| Can block tool calls? | Yes — exit code 2 on PreToolUse | No (separate permission system gates calls) | Yes — return { block: true, reason } from tool_call |
| Can mutate tool inputs? | Indirect (block + tell Claude to retry differently) | No | Yes — mutate event.input in place |
| Can mutate tool results? | PostToolUse can react but not transform | No | Yes — return partial patch from tool_result |
| Can modify system prompt? | SessionStart stdout becomes context | No | Yes — before_agent_start returns new systemPrompt |
| Can add commands? | Via skills (separate system) or plugins | No | Yes — pi.registerCommand() with autocompletion |
| Can add keybindings? | No | No | Yes — pi.registerShortcut() |
| Process model | Subprocess per hook fire | Long-lived subprocess or HTTP/SSE | In-process, same Node runtime |
| Language | Any (shell) | Any (defines the wire protocol) | TypeScript |
| Failure isolation | Hook process can fail without crashing Claude Code | Server can fail; tools become unavailable | Bad extension can crash Pi; you own the runtime |
| Performance | Process spawn per fire | One process, JSON-RPC overhead per call | Function call |
Translate this into the talk's framing: Claude Code's hooks let you observe and gate; MCP lets you add; Pi's extensions let you do everything in one cohesive surface. The cost of Pi's surface is that you write TypeScript and you assume responsibility for not crashing your harness. The benefit is that there is no behavior you can't add without convincing Anthropic to ship a new hook event.
6. Skills
Both tools implement skills against the Agent Skills standard with minor extensions. The frontmatter fields converge; the discovery and invocation models differ in small ways.
| Dimension | Claude Code | Pi |
|---|---|---|
| Format | SKILL.md in .claude/skills/<name>/ | SKILL.md in ~/.pi/agent/skills/, .pi/skills/, .agents/skills/, etc. |
| Frontmatter | description, argument-hint, allowed-tools, when_to_use, model, user-invocable, context: fork, paths, hooks | Standard name + description + optional license, compatibility, metadata, allowed-tools, disable-model-invocation |
| Argument substitution | $ARGUMENTS + named args via arguments: [name, dir] then $name | Args appended to skill content as User: <args> on /skill:name args |
| Inline shell at invocation | Yes: !`git log -20` runs and inserts output | No special syntax; skills can describe scripts to run via tools |
| Path-activated | Yes via paths | Skills always discoverable; activation up to the model based on description |
| Per-skill model | Yes via model: | No (use the extension before_agent_start to switch) |
| Subagent fork | Yes via context: fork | Not built-in |
| Bundled skills | Yes — compiled into the binary | No; install from anthropics/skills or pi-skills |
| Cross-harness skill sharing | Skills are CC-specific by default but standard-compliant | Pi can load CC skill directories: add ~/.claude/skills to the skills array in settings |
Pi's nontrivial move: it can adopt the Claude Code skill ecosystem wholesale. The standard is the same; Pi is the more lenient implementation.
7. Multi-agent
| Dimension | Claude Code | Pi |
|---|---|---|
| Topology | Top-down: parent spawns child via Task. Strict tree. Sub-agents do not see siblings. Sub-agents do not spawn teammates (flat roster); they can spawn their own children. | Flat by default: every agent is a peer. Optional orchestrator pattern by convention. |
| Communication | One-way: parent passes prompt, child returns one final result. | Bidirectional: four-tool protocol (list_agents, send_to_agent, await_reply, check_inbox). See Chapter 14. |
| Context | Fresh window (or inherit if forked). Result capped at 100,000 chars. | Each peer has its own session; messages flow between them through the comms extension. |
| Process | Local in-process or remote (when eligible). Background mode supported. | Multiple Pi processes (tmux, separate machines via comms-net HTTP). |
| Isolation | isolation: "worktree" gives each agent its own git worktree. | Process isolation by default; worktree via tmux + git. |
| Persistent memory | Per agent type: ~/.claude/agent-memory/<type>/MEMORY.md. | Per agent (named via flag). Sessions are persistent already. |
| Cancellation | Background agents survive parent's Escape; cancel via tasks panel. | Per-process; session_shutdown hooks fire on each. |
This is the deepest architectural divergence. Claude Code's multi-agent is delegation. Pi's is collaboration. Each subsumes the other in theory; in practice, the topology you start with shapes what kinds of work you'll do.
8. Sessions and compaction
| Dimension | Claude Code | Pi |
|---|---|---|
| Storage | JSON transcripts in ~/.claude/. Session ID assigned at start. | JSONL (one entry per line) in ~/.pi/agent/sessions/--<path>--/<ts>_<uuid>.jsonl. Versioned (v3). See Chapter 12. |
| Tree structure | Linear transcript. | Tree via id/parentId. Branching is moving the leaf back; abandoned branch stays in the file. |
| Resume | --resume <id> or --resume for picker. Memory re-discovered (may differ from original). | /resume in TUI; SessionManager.continueRecent() in SDK. |
| Branching / fork | Not first-class; sessions are linear. | First-class: /fork, /clone, /tree navigation, in-place branch via sm.branch(entryId). |
| Compaction trigger | Auto: oldest messages summarized when window fills. Raw transcript preserved. | Auto: when contextTokens > contextWindow - reserveTokens (default reserve 16,384). Or /compact [instructions]. |
| Compaction algorithm | Implementation detail; the docs commit to "preserves the raw transcript." | Documented in full: keep recent 20k tokens, summarize earlier, structured summary format (Goal / Progress / Decisions / Next / Critical Context + tagged files). See Chapter 12. |
| Custom compaction | PreCompact hook can inject instructions (exit 0 stdout) or block (exit 2). | Full custom compaction via session_before_compact extension hook: provide your own summary with custom data in details. |
| Branch summarization | N/A (no branches). | When you navigate the tree, Pi offers to summarize the abandoned branch and inject the summary into the new branch. |
9. Providers and models
| Dimension | Claude Code | Pi |
|---|---|---|
| Providers | Anthropic only. Auth via Claude Pro/Max subscription or Anthropic Console billing. | 15+ built-in: Anthropic, OpenAI, Google, Azure, Bedrock, Mistral, Groq, Cerebras, xAI, HuggingFace, Kimi For Coding, MiniMax, OpenRouter, Ollama. Plus custom via pi.registerProvider(). |
| Auth | OAuth (Pro/Max) or API key (Console). apiKeyHelper script. forceLoginMethod for enterprise. | API keys via env or auth.json; OAuth supported for any provider via pi.registerProvider({ oauth: {...} }); runtime override via setRuntimeApiKey(). |
| Model switching | --model at launch or /model mid-session. | --model, /model, or Ctrl+L. Cycle favorites with Ctrl+P. Per-session scopedModels. |
| Thinking level | alwaysThinkingEnabled + effortLevel (low/medium/high). | 6 levels: off, minimal, low, medium, high, xhigh. pi.setThinkingLevel() at runtime. |
| Enterprise model lockdown | availableModels managed setting (allowlist). | Custom models.json per-org; ModelRegistry filtering. |
10. Programmatic surfaces
| Surface | Claude Code | Pi |
|---|---|---|
| One-shot prompt | claude -p "prompt" (stdin / print mode) | pi -p "prompt" + --mode json for event stream |
| JSON event stream | Limited (transcript Ctrl+O, hook stdin) | pi --mode json writes session header + every AgentSessionEvent as JSONL |
| RPC subprocess | None (use claude mcp serve to expose CC as an MCP server instead) | pi --mode rpc: JSONL over stdin/stdout; full command + event surface |
| Embedded SDK | Not exposed as a public Node SDK; the binary is the interface. | @earendil-works/pi-coding-agent SDK: createAgentSession(), typed events, custom tools at the SDK layer. |
| As an MCP server | claude mcp serve — turn Claude Code itself into an MCP endpoint | Build via extension if you need it; not a built-in mode |
| Extension UI from headless | N/A — hooks are headless | Extension UI sub-protocol over RPC: dialogs, status, widgets relayed to the client |
11. Settings and packaging
| Dimension | Claude Code | Pi |
|---|---|---|
| Settings scopes | 4: user (~/.claude/settings.json), project (.claude/settings.json), local (.claude/settings.local.json), managed (MDM/registry/plist). | 2 by default: user (~/.pi/agent/settings.json), project (.pi/settings.json). Project overrides global. Enterprise lockdown via filesystem permissions. |
| JSON schema | Yes: https://schemas.anthropic.com/claude-code/settings.json | TypeBox-typed in source; no public hosted schema URL |
| Package format | Plugins (npm/git) carrying skills, agents, hooks, MCP. Managed settings can lock to plugin-only sources. | Pi packages: npm:/git: refs in settings carrying extensions + skills + prompts + themes. Filtered via the packages array. |
| Versioned pinning | npm semver, git refs. | Same: npm:@foo/pkg@1.2.3, git:host/user/repo@v1. Versioned specs skip pi update. |
| Try-without-installing | N/A as a first-class concept | pi -e <source> installs to temp for the run |
| Updating | claude update | pi update (Pi + packages), --self (Pi only), --extensions (packages only), per-package update |
12. Distribution and openness
| Dimension | Claude Code | Pi |
|---|---|---|
| License | Closed (Anthropic). | MIT. Source at earendil-works/pi-mono. |
| Vendor relationship | You depend on Anthropic. | You depend on Earendil Inc. for upstream; you can fork. |
| Telemetry/account | Subscription account or Console billing. | None unique to Pi; depends on provider you authenticate to. |
| Roadmap influence | Anthropic-driven; community can file issues. | Same plus extensions: anything you wish existed, you can implement. |
The pattern, summarized
Across every row above, the structural difference repeats: Claude Code picks reasonable defaults and exposes a configuration surface; Pi exposes the primitive and lets you build the default. The talk's "floor vs ceiling" framing is literal — Claude Code is what you get without effort; Pi is what becomes possible with effort.
Nothing above implies one tool is better. They optimize for different users. Claude Code optimizes for the engineer who wants a great agent immediately. Pi optimizes for the engineer who wants a custom agent eventually. Most teams will use one or both depending on the task. The choice is the topic of the next chapter.
Claude Code data from the community wiki: mintlify.wiki/VineeTagarwaL-code/claude-code (the source URL on every page is cited there). Pi data from pi.dev/docs/latest and the source on github.com/earendil-works/pi-mono.
Selection guide — when each fits #
"Which one should I use?" has three honest answers, not one. This chapter is the decision framework: when Claude Code is the right floor, when adopting Claude Code plus pushing on the customization surface is the move, and when owning the harness via Pi pays back the investment.
Three scenarios, three answers
Scenario A — Claude Code, out of the box
You should pick this when:
- You want the best agentic coding experience available with zero setup beyond
claude. - Your work is general software engineering: fixes, features, refactors, exploration.
- You're happy with Anthropic models; provider lock-in is a non-issue for you.
- The customization you need fits in CLAUDE.md, permission rules, and one or two hooks.
- You value polish, predictable updates, and "Anthropic operates this for me."
What this looks like in practice: one CLAUDE.md in the project root, allow-rules for your common bash commands, two hooks (Prettier on PostToolUse + npm test on Stop), one or two skills for your team's repeated workflows. You'll stop here for months.
Scenario B — Claude Code, push the surface
You should pick this when:
- Scenario A is most of your work but a specific class of task needs external systems Claude doesn't reach (databases, internal APIs, design tools, observability).
- You want explicit safety policies enforced (block
rm -rf, sandbox bash to a container, require human approval for production access). - Your team needs shared workflows beyond CLAUDE.md (multi-step deployment, structured PR reviews, code generators).
- You'll occasionally write a hook that's a real program (validation, classification, integration).
What this looks like in practice: the things from Scenario A, plus 3–8 MCP servers (your DB, ticket tracker, deploy tool, ...), 5–15 skills, a few non-trivial hooks (LLM-prompt hooks or full agent hooks for verification), and a managed-settings policy if you're at an org that needs lockdown. This is the practical ceiling for most teams.
Scenario C — Pi, own the harness
You should pick this when:
- You need a behavior that requires mutating tool inputs/outputs, intercepting the system prompt, or replacing compaction. That class of behavior is unreachable from Claude Code's hook enum.
- You need to use providers other than Anthropic (Bedrock for compliance, OpenAI for a specific capability, Ollama for offline, your in-house gateway, mid-session model switching across all of them).
- You want peer-to-peer multi-agent communication, not top-down delegation (see Chapter 14). Or you want to swap topologies as you learn.
- You want the session tree (branch, fork, clone, in-place navigation) as a first-class object you can manipulate.
- You're shipping a product on top of an agent loop and you need the SDK and RPC surfaces.
- You believe the architectural framing from the talk: harness ownership compounds, and the cost of owning is cheaper than the cost of waiting for Anthropic to ship the feature you need.
What this looks like in practice: a small .pi/ directory with a few extensions (permission gate, redactor, the comms-net extension if you're doing peer-to-peer), a SYSTEM.md per-project, a couple of skills imported from anthropics/skills, and an in-house pi package you share via git that bundles your team's extensions and prompts. You spend more time building, and you stop being blocked.
The decision framework, in five questions
- Are you locked to Anthropic? If you must use Bedrock, OpenAI, your gateway, or local models — Pi. Claude Code does not solve this.
- Do you need to intercept or mutate inside the loop? Mutating tool inputs, redacting tool results before the LLM sees them, replacing compaction with your own algorithm, modifying the system prompt per-turn — Pi. Claude Code's hooks observe and gate; they do not transform.
- Do you need peer-to-peer multi-agent or branching topologies? If your work model is "agents that talk to each other as equals" or "explore three approaches in branches I can switch between" — Pi. Claude Code's
Tasktool is strict top-down delegation with linear sessions. - Are you building a product on top? If you need an SDK in Node or RPC from another language — Pi has both as first-class. Claude Code's headless surfaces are aimed at scripts and CI.
- None of the above? Claude Code. The polish you'd be giving up isn't worth the price of carrying your own harness.
What "use both" looks like
It's a common pattern. The talk's own framing is "I still use Claude Code all the time" alongside Pi. A reasonable split:
- Claude Code for daily IDE-like work, exploration, one-off bug fixes, code review.
- Pi for production agentic systems, custom pipelines, anything that runs unattended, anything that needs a model other than Claude, anything that requires deep customization of the loop.
The dividing line is roughly "tool you use to think and write code" vs "tool you embed in a system that runs without you." Both are agent harnesses; their target users overlap but their optimization targets don't.
Migration costs, honestly
| From | To | What you lose | What you gain |
|---|---|---|---|
| Claude Code | Pi | 4-level CLAUDE.md hierarchy (collapses to 2 levels), built-in Task tool (rebuild with comms), built-in permission modes (rebuild with extension), TodoWrite panel, attribution defaults, polish on a thousand small things. | Provider freedom, full loop control, the tree, mid-session model cycling, SDK + RPC + JSON modes, primitives over features. |
| Pi | Claude Code | Provider variety, the tree, peer-to-peer comms, extension-mutated inputs/outputs, SDK access. | Polish, sub-agents as a built-in, MCP ecosystem ready-made, plan mode, managed-settings lockdown for enterprise, 4-level memory hierarchy out of the box. |
| Both | One | Operational complexity reduction. | Less switching cost; clear ownership of the workflow. |
The hardest question
Most teams pick the wrong tool not because they misjudged the tools but because they misjudged themselves. The honest version of the decision is:
- "I will write extensions" — if true, Pi pays back. If you say it but you won't, you'll get worse results than just running Claude Code.
- "I need the customizations" — if real customer behavior depends on them, Pi pays back. If they're "nice to have," you'll spend more time building the harness than using it.
- "I want the leverage" — only true if you have the kind of work where leverage compounds (recurring patterns, multi-step pipelines, things you'll run thousands of times). If your work is bespoke one-offs, owning the harness costs more than it earns.
Use Claude Code unless you have a specific, named reason not to. The reasons are real and the talk catalogs them; absent those reasons, the polish wins. If you have the reasons, Pi pays back faster than you think because the loop is small (Chapter 10) and the API is cheap to extend (Chapter 11).
The talk's framing, reread
"Cloud Code is the floor. It's not the ceiling. It's just the beginning of what's possible with tools like this." Read literally: Claude Code is what's available without effort. Pi is what's available with the effort of owning your harness. Most engineers should stop at the floor most days. The top 2% the talk refers to are the engineers who picked the right days to push past it.
The question is not "which tool wins." The question is "what work am I trying to compound?" If your work compounds — recurring patterns, repeatable pipelines, factories rather than features — the harness you control returns that compounding to you. If your work doesn't compound, a great floor is enough.
Glossary #
- Agentic engineering
- The process of engineering with intelligence that can operate on your behalf. Distinct from prompt-tuning (configuring a single agent's behavior) and traditional software engineering (writing the logic yourself).
- Agent harness
- The runtime that hosts an LLM-driven loop. Owns the system prompt, tool registry, context window strategy, I/O channels, permission model, and subprocess lifecycle.
- Software factory
- A system of agents plus deterministic code that produces engineering output on spec, repeatably, from a single prompt. Stages typically include plan, plan-review, scout, validate, build, test, review.
- ADW — AI Developer Workflow
- The speaker's term for a software factory pipeline. Combines agents and code to outperform either alone.
- Dark factory
- Industry term for a software factory that runs without a human on the critical path. Borrowed from "lights-out" manufacturing.
- ZTE — Zero Touch Engineering
- The asymptote where a prompt produces a production-ready release with no human intervention. Stated as out of scope for most teams today.
- Extensible software
- Software architected so that change is added via new modules at well-defined extension points rather than by modifying existing modules. The Open-Closed Principle, restated for the agentic era.
- AFK agent
- An always-on agent that produces value while the operator is away from keyboard. The ceiling, not the entry move. Earned by first proving the token arbitrage.
- Tokenomics
- The three-level funnel of token spend: maximize spend (level 1), make spend useful (level 2), capture revenue from the value created (level 3). Always-on is only justified at level 3.
- Token max
- Spending tokens without yet tying them to outcomes. A necessary first move, a terrible place to finish.
- Token arbitrage
- The gap between the cost of a token and the value (in revenue or time) the token produces when routed through your system.
- Token tax
- Unnecessary token spend caused by missing API access. An agent that scrapes, parses, retries, or asks the human is paying a tax that the right tool surface would eliminate.
- Agentic access
- The set of APIs, CLIs, RPC endpoints, and webhooks an agent can programmatically reach. The scope of what the agent can do for you.
- Agentic speed
- The execution rate of an agent operating on digital information. Stated by the speaker as 10x to 1000x human speed, gated entirely by whether the agent has access to the relevant tool surface.
- Pi (the agent)
- A minimal terminal coding harness from Earendil Inc. Used in the talk as the example of an extensible harness. Homepage: pi.dev.
- Peer-to-peer agent communication
- A flat topology where every agent can talk to every other agent as an equal. No orchestrator. Information flows bidirectionally. Contrast with sub-agent delegation, message-queue, and agent-chain topologies.
- Pi-to-Pi (or "pietoie")
- The speaker's name for peer-to-peer communication between Pi agents. Implemented as a four-tool extension (list, send, await, check) over either an in-process pool or a Bun HTTP server.
- comms / comms-net
- The two reference extensions in the "Pi vs Cloud Code" repo.
commsis single-device, in-process.comms-netadds a lightweight HTTP server so agents on different machines can join the pool. - Verifier pattern
- A second agent whose job is to check the work of the primary agent. Increases token spend, decreases error rate. In peer-to-peer, the verifier is a peer rather than a parent.
- Focused context window
- The discipline of keeping each agent's context narrow to one task. "A focused agent is a performant agent." Larger context windows do not remove the discipline; they raise the temptation to ignore it.
- Context engineering
- Not getting all the right things into the window. Getting just the right things. The art of choosing what to include, what to summarize, and what to leave out.
- Flat information hierarchy
- An organizational structure (or agent topology) where ideas can travel between any two participants without going up and back down a chain of command. Argued to outperform hierarchical structures because the best information often lives at the bottom.
- Agent loop
- The universal cycle every coding agent runs: build context, call model, append response, execute tool calls, append results, repeat until no more tool calls. ~60 lines of code; see Chapter 10.
- Tool registry
- A dictionary of named functions exposed to the model. Each tool has a JSONSchema-typed parameter set, a description shown to the model, and an executor that returns
{ content, details, isError }. - Context strategy
- The pure function that takes the current session and produces the message list the model sees. Owns compaction, branch-summary injection, and tool-result truncation.
- Hook bus (extension bus)
- The typed pub/sub layered over the agent loop. Extensions subscribe to lifecycle events; the loop awaits their handlers and respects their return values. The architectural lever that makes harness ownership cheap.
- JSONL session
- The append-only file format Pi uses for sessions. One JSON object per line, first line is the header, every subsequent line is a typed entry with
id/parentIdforming a tree. - Session entry
- A single line in the JSONL session file. Typed:
message,compaction,branch_summary,custom,custom_message,model_change,thinking_level_change,label,session_info. - Tree (in a session)
- The structure formed by entries'
parentIdpointers. Branching is moving the leaf back; the abandoned branch stays in the file but is no longer on the active path. - Compaction
- Pi's mechanism for keeping a long conversation within the model's context window. Walks back collecting tokens, summarizes everything earlier than
keepRecentTokensinto aCompactionEntry, rebuilds context from[summary, kept...]. - Branch summary
- A summary of an abandoned branch, generated when the user navigates the tree to a different leaf. Travels with the new branch so context isn't lost.
- Structured summary format
- Pi's summarization template: Goal / Constraints / Progress (Done, In Progress, Blocked) / Key Decisions / Next Steps / Critical Context, plus
<read-files>and<modified-files>tags. Keeps the model from treating the summary as a conversation to continue. - Steer vs follow-up
- Two ways to queue a message while the agent is streaming.
steeris delivered after the current tool call, before the next LLM call.follow-upwaits until the agent has fully stopped. - RPC mode
- Pi's subprocess protocol: JSON commands on stdin (one per LF-delimited line), JSON events and responses on stdout. The contract is in the RPC docs.
- SDK (AgentSession)
- Pi's in-process API.
createAgentSession()returns anAgentSessionwithprompt(),steer(),followUp(),subscribe(), model controls, and tree navigation. - Extension factory
- The default-exported function in a Pi extension file. Receives
ExtensionAPI; sync or async. Returning a Promise makes Pi wait beforesession_startfires. - ExtensionAPI / ExtensionContext
- The two surfaces an extension sees.
ExtensionAPIis methods onpi(register tools, commands, providers, shortcuts; send messages; control state).ExtensionContextis passed to every handler and exposesctx.ui,ctx.sessionManager,ctx.signal, etc. - Skill (Pi)
- A capability package with a
SKILL.mdand freeform supporting files. Discovered on startup; only descriptions go in the system prompt. Full content loads on-demand viareador/skill:name. Follows the Agent Skills standard. - Prompt template
- A reusable prompt stored as a Markdown file. Invoked with
/name; expanded to the file content before sending. - Pi package
- A bundle of extensions, skills, prompt templates, and/or themes shared via npm or git. Manifest in
package.jsonunder thepikey, or auto-discovered from convention directories. - Reserve tokens / keep-recent tokens
- The two knobs that govern Pi's compaction.
reserveTokens(default 16,384) is space saved for the model's response.keepRecentTokens(default 20,000) is the trailing window kept verbatim. - Provider / API kind
- Pi separates the network endpoint (provider: Anthropic, OpenAI, Bedrock, Ollama...) from the wire format (api kind:
anthropic-messages,openai-completions,openai-responses, ...). 15+ providers map onto ~5 API kinds. - Claude Code
- Anthropic's terminal-based coding agent. Closed source, Anthropic-only. The "floor" in this manual's framing: best-in-class out-of-the-box experience, with a customization surface bounded by what hooks and MCP expose.
- CLAUDE.md hierarchy
- Claude Code's 4-level memory system: managed (
/etc/claude-code/CLAUDE.md), user (~/.claude/CLAUDE.md), project (any ancestorCLAUDE.md), local (CLAUDE.local.md). Files closer tocwdload later and win the cascade. - @include directive
- Claude Code's mechanism for composing CLAUDE.md from multiple files.
@./path,@~/path,@/abs/path. Max 5 levels deep, circular refs detected. Ignored inside fenced code blocks. - Path-scoped rules
- Claude Code's
.claude/rules/*.mdfiles with frontmatterpaths:. The rule only enters context when Claude is working on a matching file. Keeps context lean. - Permission mode (Claude Code)
- One of
default(ask on dangerous),acceptEdits(auto-approve edits, ask on bash),plan(read-only),bypassPermissions(skip checks). Set per-session or per-project via settings. - Permission rule
- An allow/deny/ask entry in Claude Code settings. Format:
"Bash(git *)","mcp__server__tool". Wildcard matching. Compound bash commands split and checked independently; most restrictive result wins. - Hook (Claude Code)
- A shell command, HTTP POST, LLM prompt, or full agent triggered by a Claude Code lifecycle event (
PreToolUse,PostToolUse,Stop,SessionStart,UserPromptSubmit,PreCompact, ...). Exit code controls behavior: 0 succeed, 2 block, other show stderr to user. - MCP (Model Context Protocol)
- An open standard for connecting agents to external tools and data. Servers expose tools that appear in Claude Code as
mcp__<server>__<tool>. Three transports: stdio, HTTP, SSE. Pi does not ship MCP support; it can be added via extension or replaced with skills that wrap CLI tools. - Task tool / sub-agent (Claude Code)
- Claude Code's built-in mechanism for spawning a sub-agent with isolated context, optionally restricted tools, foreground or background, optional worktree isolation. Strict top-down: parent passes prompt, child returns one final result. Contrast with Pi's peer-to-peer model.
- Worktree isolation
- Claude Code option
isolation: "worktree"on a sub-agent — gives the agent its own git worktree so changes don't touch your working directory until you merge. Pi achieves the same via tmux + git from an extension. - TodoWrite
- Claude Code's built-in structured task list. Items have statuses (pending, in_progress, completed); renders in a persistent panel in the TUI. Pi's equivalent: write to
TODO.mdor build an extension. - Plan mode
- Claude Code permission mode that blocks all writes and bash. Claude can read, search, and discuss, but must exit plan mode (via
ExitPlanMode) to make changes. Pi: build via extension. - Managed settings
- Claude Code's enterprise-control layer. Pushed via MDM (macOS), registry (Windows), or platform-specific file path. Locks:
allowManagedHooksOnly,allowManagedPermissionRulesOnly,allowManagedMcpServersOnly,strictPluginOnlyCustomization. Takes precedence over user/project/local. - Plugin (Claude Code)
- A bundle of skills, agents, hooks, and MCP configs distributed via npm or git. The closest analog to a Pi package. Can be locked-down via managed-settings.
- Floor vs ceiling
- The talk's framing for the relationship between Claude Code and Pi. Claude Code is the floor (great baseline available without effort). Pi is the ceiling (what becomes possible with effort). Most engineers should pick the floor most days; the leverage lives in choosing the right days to push past it.
Primary sources #
Where to go to verify and to go deeper. Linked once here so they are easy to find when the body text references them.
The talks and the speaker
- Andy "Dev Dan" Hennings (IndyDevDan), channel and writing on agentic engineering: agenticengineer.com.
- Talk 1: "Top 1 Opportunity for Senior Engineers" — the five pillars overview that anchors Chapters 01-07 of this wiki.
- Talk 2: "Pi to Pi Agent Communication" — the worked example of peer-to-peer harness extension, anchoring Chapter 08.
- Karpathy at the Sequoia AI Ascent (the naming event for "agentic engineering"): sequoiacap.com/ai-ascent.
Pi coding agent
- Homepage: pi.dev
- Docs: pi.dev/docs/latest
- Source: github.com/earendil-works/pi
- npm: @earendil-works/pi-coding-agent
- Discord: community server
- Package directory: pi.dev/packages
- Models reference: pi.dev/models
- Author blog (Mario Zechner): launch post at mariozechner.at and the MCP essay.
Claude Code (Chapters 16-18)
- Anthropic's official Claude Code documentation: docs.claude.com/en/docs/claude-code
- Community Claude Code wiki (the source for this chapter's facts; each page cites its own upstream): mintlify.wiki/VineeTagarwaL-code/claude-code
- Concepts — how it works: how-it-works
- Concepts — tools: tools
- Concepts — memory and CLAUDE.md: memory-context
- Concepts — permissions: permissions
- Guide — hooks: hooks
- Guide — skills: skills
- Guide — MCP servers: mcp-servers
- Guide — multi-agent: multi-agent
- Configuration — settings: settings
- Reference — commands overview: commands-overview
- MCP standard: modelcontextprotocol.io
- Agent Skills standard (followed by both Pi and Claude Code): agentskills.io/specification
- Anthropic's skill repository (consumable by both tools): github.com/anthropics/skills
Tools referenced in the case study
- Cloud sandbox for agents (the canonical example in Demo 2): e2b.dev.
- Persistent-VM sandbox compared in Demo 2: exe.dev.
- "Pi vs Cloud Code" reference codebase with the comms and comms-net extensions: see the speaker's channel for the current GitHub link (agenticengineer.com).
Deep-dive sources (Chapters 10-15)
- Extensions API reference: pi.dev/docs/latest/extensions
- Session file format: pi.dev/docs/latest/session-format
- Compaction & branch summarization: pi.dev/docs/latest/compaction
- SDK reference: pi.dev/docs/latest/sdk
- RPC mode protocol: pi.dev/docs/latest/rpc
- JSON event stream mode: pi.dev/docs/latest/json
- Skills (Agent Skills standard): pi.dev/docs/latest/skills · agentskills.io spec
- Pi packages (sharing extensions): pi.dev/docs/latest/packages
- Source: session-manager.ts, compaction.ts, branch-summarization.ts.
- Example extensions: examples/extensions (50+ files). SDK examples: examples/sdk.
- TypeBox (schema for tool parameters): github.com/sinclairzx81/typebox
- Bun (used for the comms-net reference server): bun.sh
- jiti (how Pi loads TS extensions without a build step): github.com/unjs/jiti
Foundational principles
- Open-Closed Principle, Bertrand Meyer, Object-Oriented Software Construction (1988).
- Unix philosophy (small composable tools): Ritchie and Thompson, CACM, 1974; McIlroy interview.
- Toyota Production System (the standardization-and-instrumentation precedent for software factories): Toyota global site.
- Ford moving assembly line, 1913: The Henry Ford archive.
Tokenomics adjacent reading
- Bill Gurley on the LTV math trap: above the crowd, 2012.
- Andrew Chen on arbitrages eroding: the law of shitty clickthroughs.