Claude Opus 4.6 Introduces Adaptive Reasoning and Context Compaction for Long-Running Agents

Recently, Anthropic released Claude Opus 4.6, marking a shift from static inference to dynamic orchestration in its flagship model. The update introduces adaptive thinking effort controls and context compaction, architectural features designed to address context degradation and overthinking issues in long-running agentic workflows.

Claude Opus 4.6 is now available across all major cloud platforms, including Microsoft Foundry, AWS Bedrock, and Google Cloud's Vertex AI.

Opus 4.6 replaces binary reasoning toggles with four granular effort controls: low, medium, high (default), and max. This allows developers to programmatically calibrate the model's internal chain-of-thought depth based on task complexity.

Anthropic notes in its announcement that:

Opus 4.6 often thinks more deeply and more carefully revisits its reasoning before settling on an answer. This produces better results on harder problems, but can add cost and latency on simpler ones.

Moreover, the company recommends dialing effort down to medium for straightforward tasks to reduce latency and cost.

Thinking tokens are billed as output tokens at $25 per million. For agentic systems making dozens of API calls, managing these effort levels becomes a primary cost control mechanism.

While Opus 4.6 introduces a 1M token context window in beta, which is enough to process approximately 750,000 words, the more significant architectural update is context compaction. This feature addresses performance degradation as context windows fill, a phenomenon Anthropic calls "context rot."

When a conversation approaches the limit, the API automatically summarizes earlier portions and replaces them with a compressed state. On the MRCR v2 (Multi-needle Retrieval) benchmark at 1M tokens, Opus 4.6 achieved 76% accuracy, which is a fourfold improvement over Sonnet 4.5's 18.5%. Anthropic describes this as:

A qualitative shift in how much context a model can actually use while maintaining peak performance.

The model also delivers a maximum output of 128K tokens, doubling the previous 64K limit.

Microsoft positions its service, Foundry, as an interoperable platform where intelligence and trust converge to enable autonomous work. In its blog post, Microsoft states that Opus 4.6 can leverage Foundry IQ to access data from Microsoft 365 Work IQ, Fabric IQ, and the web.

Furthermore, Microsoft describes the model as:

Best applied to complex tasks across coding, knowledge work, and agent-driven workflows, supporting deeper reasoning while offering superior instruction following for reliability.

The company emphasizes Foundry's "managed infrastructure and operational controls" that allow teams to "compress development timelines from days into hours."

Opus 4.6 is also available through Microsoft Copilot Studio, Google Cloud's Vertex AI Agent Builder, and Amazon Bedrock Agents, enabling organizations to build and deploy AI agents without custom code.

The release includes Agent Teams in Claude Code as a research preview, allowing developers to spin up multiple agents that work in parallel and coordinate autonomously. Anthropic describes this as:

Best for tasks that split into independent, read-heavy work like codebase reviews.

Furthermore, Claude's integration into PowerPoint, also in research preview, allows the model to read layouts, fonts, and slide masters to generate presentations that stay on brand. The feature is available for Max, Team, and Enterprise plans.

Anthropic also claims state-of-the-art results on multiple evaluations:

Terminal-Bench 2.0 (agentic coding): 65.4% (highest score)
Humanity's Last Exam: Leads all frontier models
GDPval-AA (knowledge work): Outperforms OpenAI's GPT-5.2 by ~144 Elo points
BrowseComp: Best performance for locating hard-to-find information

The image displays a bar chart comparing different AI models' performance in various tasks, such as Agentic search, Coding, and Reasoning, with specific Elo scores for each model.AI-generated content may be incorrect.

(Source: Athropic blog post)

The model found over 500 previously unknown high-severity security vulnerabilities in open-source libraries, including Ghostscript, OpenSC, and CGIF. However, independent testing by Quesma revealed limitations: Claude Opus 4.6 detected backdoors in compiled binaries only 49% of the time when using open-source tools like Ghidra, with notable false positives.

Hacker News discussion highlighted concerns about regression from Opus 4.5, with users reporting that the new model performs worse on certain tasks.

Base pricing remains $5 per million input tokens and $25 per million output tokens. However, a "long-context premium" of $10/$37.50 per million tokens applies to the entire request once input exceeds 200K tokens. The 1M context window is currently available in beta only through Claude's native API. US-only inference carries a 1.1x pricing multiplier.

Lastly, the model is accessible through claude.ai, the Claude API (model string: claude-opus-4-6), Microsoft Foundry, AWS Bedrock, Google Cloud Vertex AI, and GitHub Copilot for Pro, Business, and Enterprise users.

About the Author

Steef-Jan Wiggers

Show moreShow less

InfoQ Software Architects' Newsletter

Write for InfoQ

About the Author

Steef-Jan Wiggers

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter