InfoQ Homepage Large language models Content on InfoQ
-
Google Open Sources Experimental Multi-Agent Orchestration Testbed Scion
Designed to manage concurrent agents running in containers across local and remote compute, Scion is an experimental orchestration testbed that enables developers to run groups of specialized agents with isolated identities, credentials, and shared workspaces.
-
OpenAI Extends the Responses API to Serve as a Foundation for Autonomous Agents
OpenAI announced they are extending the Responses API to make it easier for developer to build agentic workflows, adding support for a shell tool, a built-in agent execution loop, a hosted container workspace, context compaction, and reusable agent skills.
-
Green IT: How to Reduce the Impact of AI on the Environment
AI poses major challenges for green IT: each query consumes vast energy, GPU chips last only 2-3 years, and costs stay hidden from users. Regulatory frameworks like the EU AI Act fall short on enforcement, Ludi Akue said. In her talk What I Wish I Knew When I Started with Green IT she presented model compression, quantization, and novel architectures, using sustainability as a design constraint.
-
QCon London 2026: Ethical AI is an Engineering Problem
At QCon London 2026, Clara Higuera, responsible AI program lead at BBVA, presented how many of the risks associated with AI systems are fundamentally engineering challenges rather than purely governance or policy issues.
-
Apple Improves Context Window Management for its Foundation Models
iOS 26.4, now in Release Candidate, introduces improved context window management for Apple's Foundation Models, helping developers work with the 4096-token context window limit. This encourages treating the context window as a constrained resource, which requires actively managing it like memory in a low-resource system to optimize its usage.
-
Sonatype Launches Guide to Enhance Safety in AI-Assisted Code Generation
Sonatype Guide is a real-time guardrail system that sits between AI coding tools and the open-source ecosystem, ensuring AI-generated code uses safe, valid, and maintainable dependencies.
-
Stripe Engineers Deploy Minions, Autonomous Agents Producing Thousands of Pull Requests Weekly
Stripe engineers describe Minions, autonomous coding agents generating over 1,300 pull requests per week. Tasks can originate from Slack, bug reports, or feature requests. Using LLMs, blueprints, and CI/CD pipelines, Minions produce production-ready changes while maintaining reliability and human review.
-
QCon London 2026: Refreshing Stale Code Intelligence
At QCon London 2026, Jeff Smith discussed the growing mismatch between AI coding models and real-world software development. While AI tools are enabling developers to generate code faster than ever, Smith argued that the models themselves are increasingly “stale” because they lack the repository-specific knowledge required to produce production-ready contributions.
-
Google Researchers Propose Bayesian Teaching Method for Large Language Models
Google Research has proposed a training method that teaches large language models to approximate Bayesian reasoning by learning from the predictions of an optimal Bayesian system. The approach focuses on improving how models update beliefs as they receive new information during multi-step interactions.
-
DoorDash Builds LLM Conversation Simulator to Test Customer Support Chatbots at Scale
DoorDash engineers built a simulation and evaluation flywheel to test large language model customer support chatbots at scale. The system generates multi-turn synthetic conversations using historical transcripts and backend mocks, evaluates outcomes with an LLM-as-judge framework, and enables rapid iteration on prompts, context, and system design before production deployment.
-
AWS Launches Strands Labs for Experimental AI Agent Projects
Amazon Web Services has introduced Strands Labs, a new GitHub organization created to host experimental projects related to agent-based AI development.
-
Scaling Human Judgment: How Dropbox Uses LLMs to Improve Labeling for RAG Systems
To improve the relevance of responses produced by Dropbox Dash, Dropbox engineers began using LLMs to augment human labelling, which plays a crucial role in identifying the documents that should be used to generate the responses. Their approach offers useful insights for any system built on retrieval-augmented generation (RAG).
-
Google Publishes Scaling Principles for Agentic Architectures
Researchers from Google and MIT published a paper describing a predictive framework for scaling multi-agent systems. The framework shows that there is a tool-coordination trade-off and it can be used to select an optimal agentic architecture for a given task.
-
OpenAI Codex-Spark Achieves Ultra-Fast Coding Speeds on Cerebras Hardware
In a major shift in its hardware strategy, OpenAI launched GPT-5.3-Codex-Spark, its first production AI model deployed on Cerebras wafer-scale chips rather than traditional Nvidia GPUs. The new model offers delivers improved throughput and low-latency, enabling a real-time, interactive coding experience, says the company.
-
Apple Researchers Introduce Ferret-UI Lite, an On-Device AI Model for Seeing and Controlling UIs
Apple's Ferret-UI Lite is a 3B-parameter model optimized for mobile and desktop screens, designed to interpret screen images, understand UI elements such as icons and text, and interact with apps by, e.g., reading messages, checking health data, and more.