BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News QConSF 2025 - Developing Claude Code at Anthropic at AI Speed

QConSF 2025 - Developing Claude Code at Anthropic at AI Speed

Listen to this article -  0:00

At QCon San Francisco 2025, Adam Wolff described how Claude Code at Anthropic is built with an AI coding assistant at the center of the workflow. He reported that about 90% of the tool's production code is written by or with Claude Code. The team ships continuously to internal users and targets weekday releases for external users.

With an assistant that can generate and refactor code plus tests quickly, he said, planning loses some of its former central role. The limiting factor becomes how fast teams can ship, observe behavior in production, and update their understanding of the requirements.

"Implementation used to be the expensive part of the loop. With AI in the workflow, the limiting factor is how fast you can collect and respond to feedback." 

Claude Code needs rich terminal input, including slash commands, file mentions, and keystroke-specific behavior. Conventional advice says not to rebuild text input because users expect a large set of editing shortcuts. The team decided to take control of input anyway because they needed full control over every keystroke. Wolff described this decision as a bet that could only be evaluated after shipping the first version and presented three stories from Claude Code development.

In the first story, they introduced a virtual Cursor class that models the text buffer and cursor position as an immutable value. The initial implementation was a few hundred lines of TypeScript supported by a substantial test suite. Later, another engineer added Vim mode on top in a single pull request, with hundreds of lines of logic and tests generated with Claude Code.

As adoption grew across languages, Unicode-related issues began to surface. The team added grapheme clustering, and a later refactor reduced worst-case latency from several seconds per keystroke to a few milliseconds by deferring work and using more efficient search strategies. Wolff treats this story as an example of a successful experiment in which the pain of additional complexity decreased over time, and the architecture continued to support fast changes.

In the second story, Wolff examined how Claude interacts with the shell. The first design was a PersistentShell class that managed one long-running shell process behind a queue of commands. This preserved natural shell semantics for working directory and environment variables, since each command ran in the same process. The implementation was several hundred lines of code with logic for queueing, recovery, and pseudo-terminal handling.

Problems arose when the team introduced a batch tool that allowed the model to run multiple commands at once. The queue inside PersistentShell serialized these calls, creating a bottleneck in agent behavior. The team replaced it with a design in which each command starts a new shell process. After shipping this change and receiving complaints, they settled on a snapshot approach that captures aliases and functions once in the user shell and sources that script before each transient command. Wolff observed that "you do not plan this kind of design, you discover it through experimentation."

"Shipping small changes frequently, and being willing to unship when needed, is central to our use of AI in development. The loop of build, ship, observe, and adjust is where most of the value appears."

The third story focused on persistence for Claude Code conversations. The initial implementation used append-only JSONL files on disk, which required no external services and had no special installation requirements. This design already worked for production users. Wolff still wanted stronger query capabilities and structured migrations, so the team decided to adopt SQLite with a type-safe ORM.

After the database-backed version shipped, problems appeared in rapid sequence. The native SQLite driver caused install failures on some systems, especially with package managers that handle native binaries in strict ways, and Wolff commented that "native dependencies basically do not work for this distribution model" in this context. Locking behavior in SQLite under concurrent access did not align with developers' expectations, given their experience with row-level locks in other databases. Within fifteen days, the team removed the SQLite layer and returned to the simpler JSONL storage.

Across the three stories, Wolff returned to the question of what shipping reveals that planning does not. He framed the core distinction between detours and dead ends in terms of how the pain evolved. When each iteration reduced bugs and improved behavior, as in the cursor case, the team stayed on the path. When effort uncovered new composition techniques, as with transient shells and snapshots, the result was a productive failure that yielded a better structure. When work increased fragility and user impact without a clear path to improvement, as with the SQLite experiment, the best decision was to undo the change.

Developers who want to learn more can watch InfoQ in the coming weeks for video and can view the slides here.

About the Author

Rate this Article

Adoption
Style

BT