BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News QCon London AI Coding State of the Game: More Capable, More Expensive, More Dangerous Coding Agents

QCon London AI Coding State of the Game: More Capable, More Expensive, More Dangerous Coding Agents

Listen to this article -  0:00

In her QCon London keynote, Birgitta Böckeler, Distinguished Engineer for AI-assisted Software Delivery at Thoughtworks, reflected on the changes in the AI coding space over the past year. She emphasised a shift from vibe coding to using autonomous coding agents or swarms of agents. According to her, two major concerns in the field are the worsening security landscape and the rising costs of agent-based development.

In the introductory part of her presentation, she reminded the audience about the state of AI coding just a year ago: "Vibe coding was just two months old", "MCP was all the rage," and "Claude Code was not even generally available yet." She highlighted that context engineering is probably the most significant development of the year. This context refers to the curated information a model or agent reads to improve its results. Last spring, it was limited to a single rules file (agents.md or claude.md) loaded at the start of each session to capture coding conventions and recurring pitfalls.

Anthropic has since broken down this "monolithic" file into smaller skills, resulting in a more granular approach to coding capabilities. This enables a more pragmatic approach known as "lazy loading," in which different sets of rules are loaded based on the task at hand. This not only improves organisation but also ensures that the limited context window fills more slowly. However, Böckeler pointed out that a "fresh" Claude Code session had already reached 15% capacity before any prompt was even given.

Böckeler emphasised that we are moving closer to "hands-off" coding, as these coding agents can now run unsupervised for up to 20 minutes. Headless CLI modes can directly connect to CI/CD pipelines via GitHub Actions. Some practitioners, following Steve Yegge's "eight stages of dev evolution to AI", run three or more local sessions in parallel; however, Böckeler noted her experience of "typing the wrong thing into the wrong session."

An even more advanced approach involves using coding agent swarms. Though she argued that experiments from Cursor or Anthropic—where C compilers or web browsers were built in a few days by a "team" of coding agents—are somewhat skewed, as these tasks are well-defined and have extensive public test suites. This is usually not true for enterprise software. A more accessible entry point is Claude Code's Agent Teams feature, which orchestrates a small number of agents with a clear coordination model.

To ensure the appropriate level of supervision, she proposed a risk framework based on three variables: the probability that the AI will make a mistake, the impact of that mistake, and the detectability of the error. Only the first variable is genuinely novel: developing intuition for how well a tool can handle a given task. The other two are engineering judgments that experienced developers should already possess.

Beyond simply generating functionally incorrect code, security incidents involving coding agents are now occurring weekly, with most rooted in prompt injection. Eleven days before the talk, an attacker used a crafted GitHub issue to extract secrets and upload malicious packages to an NPM registry. This was a direct result of an unsupervised agent operating without sufficient sandboxing. Simon Willison's lethal trifecta defines that better: when an agent combines exposure to untrusted content, access to private data, and the ability to communicate externally, the risk becomes significant. For example, connecting an email with read-and-send permissions satisfies all three conditions.

Böckeler: Security is not a technical problem; it's a conceptual problem.

In her conclusion, Böckeler noted that while model improvements are real, they are the least interesting developments compared to the shifts in tooling and practices surrounding them. An OpenAI team running a five-month autonomous greenfield project still reported entropy creeping in, despite custom linters and garbage-collection agents. The main question she posed to the audience was, "What practices will you enforce on your coding agent?" Whether these practices are good or bad, AI coding will amplify them.

About the Author

Rate this Article

Adoption
Style

BT