BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Anthropic's Code With Claude Announces Managed Agents, Proactive Workflows, Capability Curve

Anthropic's Code With Claude Announces Managed Agents, Proactive Workflows, Capability Curve

Listen to this article -  0:00

Anthropic hosted Code with Claude 2026 in San Francisco on May 6, publishing livestream sessions to YouTube that covered shipping work across Claude Code, the Claude Developer Platform, and partner deployments at GitHub, Vercel, Datadog, Bun, and several AI-native startups. Across the day, the throughline was the consequences of model step-changes for product architecture, organizational design, and infrastructure economics.

Dickson Tsai of Anthropic's Claude Code team showed recent updates to Claude code. On the developer experience side, remote control lets a session start on one machine and continue on a phone and a redesigned desktop GUI adds split views, the ability to pin assistant messages as chapters with a generated table of contents, and inline diff comments. On the autonomy side, auto mode moves permission decisions to a classifier that screens for destructive actions and prompt injection, and worktrees give Claude an enter and exit tool to spin up isolated branches on its own. Tsai also demonstrated routines, which run prompts on cron schedules, GitHub webhooks, or API endpoints.

GitHub chief product officer Mario Rodriguez followed with a co-presented session with Anthropic's Brad Abrams. Rodriguez framed cache hit rate as the foundational metric for any team sending billions of messages to the platform. "It's kind of like high frequency trading," he said. "Just 1% efficiency means millions overall." GitHub targets cache hit rates above 94 percent, with a drop to 70 percent typically signaling a bug in prompt assembly. Rodriguez listed three causes of cache invalidation that GitHub has had to engineer around.

Abrams used the slot to introduce an advisor strategy in which a smaller executor model such as Haiku calls a larger advisor model such as Opus only on the hard cases. "We get close to opus level intelligence at much lower prices because we're being very conservative about the tokens that advisor actually sends," Abrams said. Rodriguez paired that with a critic, internally nicknamed Rubber Duck, that runs after planning, after a complex implementation, and after writing tests but before running them.

Anthropic Managed Agents product manager Jess Yan and Anthropic member of technical staff Lance Martin demoed Claude Managed Agents around the lunch slot and argued that infrastructure, rather than intelligence, is now the bottleneck for production agents, walking through primitives for sandboxed code execution, checkpointing, and credential scoping.

Anthropic co-founder and CEO Dario Amodei and co-founder and president Daniela Amodei took the main stage at 1 p.m. Daniela Amodei said developers "are the most important users of Claude" and described an internal cultural value, hold light and shade, that governs how Anthropic ships powerful models alongside safety guardrails. Dario Amodei reported that first-quarter 2026 revenue and usage, on an annualized basis, grew 80x rather than the 10x Anthropic had planned for, which he said is the underlying cause of recent compute pressure that the SpaceX partnership announced earlier in the day partly addresses.

He reiterated his earlier prediction that a one-person billion-dollar company would emerge in 2026, noting that two-person companies built with AI have already crossed a billion dollars in valuation. He said the next inflection is teams of agents working at the level of organizations rather than individuals, and that the things slowing down are the non-verifiable parts of software engineering such as design quality and security review, which Anthropic is now focused on training models to handle.

Anthropic Claude Code head Boris Cherny and Bun creator Jarred Sumner used a live coding session to show how Bun maintains itself with a Robobun bot that reproduces every issue and only opens a pull request once a generated regression test fails on the previous Bun version and passes on the fix branch. Datadog VP of engineering Sesh Nalla introduced a machine tool concept that has agents emit "precise specifications of the intent and problem domain" rather than invent disconnected tools for every local need.

Vercel CEO Guillermo Rauch then sat with Anthropic platform product head Angela Jiang. Rauch reported that Opus tokens represent roughly twenty-something percent of Vercel AI Gateway usage but more than seventy percent of spend, and that credit spend on V0 has doubled since the most recent Anthropic upgrade. He said smarter models let Vercel simplify the harness, improved model taste meant V0 could absorb a decade of Vercel's design judgment rather than fight it, and the tool surface contracted as models wrote intermediate code in sandboxes rather than rely on predefined sub-agents. "We're now engineering more around tool approvals," Rauch said. "It's around creating the right security guardrails." 

A panel moderated by Anthropic startup partnerships lead Beth Robertson brought together Cognition co-founder Walden Yan, Gamma head of product for AI Deeni Fatiha, and Harvey applied research head Niko Grupen to discuss product architecture under exponential model progress. Cognition makes Devin, an autonomous coding agent that runs its own computer; Gamma is an AI-native presentation and document tool with more than 70 million users; and Harvey is the generative AI platform for legal and professional services. Each panelist described having to rewrite their product around a model inflection point.

Brad Abrams returned later in the day for a standalone Claude Platform session focused on prompt caching, structured outputs, and tool design patterns observed across customers running large workloads. Anthropic developer relations head Alex Albert closed the day by reporting that Claude moved from 62 percent on SWE-bench Verified with Sonnet 3.7 a year ago to 87 percent with Opus 4.7, and used the capability curve framing to set expectations for the year ahead.

Developers interested in learning more can watch the full session recordings on Anthropic's YouTube channel, browse the Code with Claude session pages on claude.com, or register for the London edition on May 19 and Tokyo on June 10.

About the Author

Rate this Article

Adoption
Style

BT