BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News QConAI NY 2025 - Designing AI Platforms for Reliability: Tools for Certainty, Agents for Discovery

QConAI NY 2025 - Designing AI Platforms for Reliability: Tools for Certainty, Agents for Discovery

Listen to this article -  0:00

At QCon AI NYC 2025, Aaron Erickson presented agentic AI as an engineering problem, not a prompt craft exercise. His central message was that reliability comes from combining probabilistic components with deterministic boundaries.

Erickson argued that agentic AI becomes more interesting when treated as a layer over real operational systems rather than as a replacement for them. The model can interpret questions, retrieve evidence, classify situations, and suggest actions. Deterministic systems execute actions, enforce constraints, and provide the telemetry that enables the whole loop to be evaluated.

He described a common trap in natural language for SQL and similar query-generation patterns. The first few demos work because the questions are simple and the schema is small. Accuracy falls sharply when the schema is complex, and the query space includes many joins, edge cases, or overloaded fields. One mitigation he emphasized was reducing degrees of freedom: flatten the schema, constrain query forms, and treat expressiveness as a cost that must be paid with more evaluation and additional safeguards.

He also observed a pragmatic difference between classification and code generation. When the system’s job is to select among a small set of known categories, a model can be very effective. When the system’s job is to invent an arbitrary program over a large search space, error rates climb. That gap becomes a design lever. You can ask the model to classify an intent, then route to a deterministic query template or a bounded tool call.

He showed a slide with a large cheesecake menu, which made the point that tool choice itself is a reliability problem. “LLMs can suffer from ‘paradox of choice’.” When too many tools look similar, selection quality degrades, and the model may confidently choose a suboptimal or unsafe path. The engineering implication is that tool catalogs and tool interfaces are part of the product. Tooling should be differentiated, well described, and constrained, or the agent will behave like a user staring at an enormous menu, he said.

Erickson then described why role specialization matters. A general-purpose agent that “knows a bit about everything” can be helpful for routing and summarization, but the system’s correctness depends on purpose-built components that do specific tasks with narrow contracts. He described a manager-like layer that delegates, but treated it as orchestration rather than as the place where domain logic should live. In his view, the important work is in the specialized agents and deterministic tools that actually touch the underlying systems.

This set up his taxonomy of agent behaviors. One of the most concrete examples was the “Worker Agent” slide, which showed someone painting spirals on rocks, paired with a prompt to examine large numbers of clusters and flag the ones worth attention. He argued agents can be deployed across thousands of similar records, do the same analysis repeatedly, and store structured outputs for later review.

He described additional roles that help control complexity as systems grow. A tool selection agent can help reduce ambiguity when there are multiple ways to achieve an outcome. An observer or consulting style agent can monitor interactions between components and flag unsafe communication patterns, policy violations, or quality regressions. A director agent can delegate work across other agents and track progress toward a measurable outcome. The message mirrored classic testing guidance: push as much confidence as possible down into cheap tests and reserve complete system runs for validating integration behavior.

He also used a simple operational analogy to justify deterministic anchors. He asked whether you reinvent routine operations every time, then answered that you do not; you provide operators a deterministic runbook. He argued that agentic systems should inherit this habit. Where repeatability matters, encode repeatability in tools and runbooks, and let the agent decide when the runbook applies instead of allowing the agent to invent a new process for every incident.

Erickson finally returned to the split between certainty and discovery. Discovery is where agents explore, propose, and surface anomalies. Certainty is where deterministic tools execute bounded operations and enforce policy. He argued the boundary between them is where platform engineering lives: authentication, authorization, auditing, telemetry, and safe degradation.

Developers who want to learn more can wait until January 15th, 2026, when the full recording of the talk will be made available on InfoQ.com.

About the Author

Rate this Article

Adoption
Style

BT