InfoQ Homepage Large language models Content on InfoQ
-
The Architectural Shift: AI Agents Become Execution Engines While Backends Retreat to Governance
A fundamental shift in enterprise software architecture is emerging as AI agents transition from assistive tools to operational execution engines, with traditional application backends retreating to governance and permission management roles. This transformation is accelerating across sectors, with 40% of enterprise applications expected to include autonomous agents by 2026.
-
NVIDIA Introduces OmniVinci, a Research-Only LLM for Cross-Modal Understanding
NVIDIA has introduced OmniVinci, a large language model designed to understand and reason across multiple input types — including text, vision, audio, and even robotics data. The project, developed by NVIDIA Research, aims to push machine intelligence closer to human-like perception by unifying how models interpret the world across different sensory streams.
-
Anthropic Introduces Skills for Custom Claude Tasks
Anthropic has unveiled a new feature called Skills, designed to let developers extend Claude with modular, reusable task components.
-
DeepSeek AI Unveils DeepSeek-OCR: Vision-Based Context Compression Redefines Long-Text Processing
DeepSeek AI has developed DeepSeek-OCR, an open-source system that uses optical 2D mapping to compress long text passages. This approach aims to improve how large language models (LLMs) handle text-heavy inputs.
-
Google Research Open-Sources the Coral NPU Platform to Help Build AI into Wearables and Edge Devices
Coral NPU is an open-source full-stack platform designed to help hardware engineers and AI developers overcome the limitations that prevent integrating AI in wearables and edge devices, including performance, fragmentation, and user trust.
-
Google Introduces LLM-Evalkit to Bring Order and Metrics to Prompt Engineering
Google has introduced LLM-Evalkit, an open-source framework built on Vertex AI SDKs, designed to make prompt engineering for large language models less chaotic and more measurable. The lightweight tool aims to replace scattered documents and guess-based iteration with a unified, data-driven workflow.
-
Researchers Introduce ACE, a Framework for Self-Improving LLM Contexts
Researchers from Stanford University, SambaNova Systems, and UC Berkeley have proposed Agentic Context Engineering (ACE), a new framework designed to improve large language models (LLMs) through evolving, structured contexts rather than weight updates. The method, described in a paper, seeks to make language models self-improving without retraining.
-
Hugging Face Introduces RTEB, a New Benchmark for Evaluating Retrieval Models
Hugging Face unveils the Retrieval Embedding Benchmark (RTEB), a pioneering framework to assess embedding models' real-world retrieval accuracy. By merging public and private datasets, RTEB narrows the "generalization gap," ensuring models perform reliably across critical sectors. Now live and inviting collaboration, RTEB aims to set a community standard in AI retrieval evaluation.
-
10 AI-Related Standout Sessions at QCon San Francisco 2025
Join us at QCon San Francisco 2025 (Nov 17–21) for a three-day deep dive into the future of software development, exploring AI’s transformative impact. As a program committee member, I’m excited to showcase tracks that tackle real-world challenges, featuring industry leaders and sessions on AI, LLMs, and engineering mindsets. Don’t miss out!
-
Paper2Agent Converts Scientific Papers into Interactive AI Agents
Stanford's Paper2Agent framework revolutionizes research by transforming static papers into interactive AI agents that execute analyses and respond to queries. Leveraging the Model Context Protocol, it simplifies reproducibility and enhances accessibility, empowering users with dynamic, autonomous tools for deeper scientific exploration and understanding.
-
GitHub MCP Registry Offers a Central Hub for Discovering and Deploying MCP Servers
GitHub has recently launched its Model Context Protocol (MCP) Registry, designed to help developers discover and use the AI tools directly from within their working environment. The registry currently lists over 40 MCP servers from Microsoft, GitHub, Dynatrace, Terraform, and many others.
-
OpenAI Study Investigates the Causes of LLM Hallucinations and Potential Solutions
In a recent research paper, OpenAI suggested that the tendency of LLMs to hallucinate stems from the way standard training and evaluation methods reward guessing over acknowledging uncertainty. According to the study, this insight could pave the way for new techniques to reduce hallucinations and build more trustworthy AI systems, but not all agree on what hallucinations are in the first place.
-
Claude Sonnet 4.5 Tops SWE-Bench Verified, Extends Coding Focus beyond 30 Hours
Anthropic's Claude Sonnet 4.5, its most advanced coding model, excels in task performance and safety, achieving a 98.7% safety score and improving real-world coding capabilities. Enhanced reasoning skills allow for sustained multi-step tasks, with notable user gains reported. This drop-in replacement demonstrates a powerful balance of capability and security for users.
-
Google DeepMind Introduces CodeMender, an AI Agent for Automated Code Repair
Google DeepMind has introduced CodeMender, a new AI-driven agent designed to detect, fix, and secure software vulnerabilities automatically. The project builds on recent advances in reasoning models and program analysis, aiming to reduce the time developers spend identifying and patching security issues.
-
GitHub Introduces New Embedding Model to Improve Code Search and Context
GitHub has introduced a new embedding model for Copilot, now integrated into Visual Studio Code. The model is designed to improve how Copilot understands programming context, retrieves relevant code, and suggests completions.