InfoQ Homepage Large language models Content on InfoQ

News

RSS Feed

Newer Older

DevOps

NVIDIA Dynamo Addresses Multi-Node LLM Inference Challenges

Serving Large Language Models (LLMs) at scale is complex. Modern LLMs now exceed the memory and compute capacity of a single GPU or even a single multi-GPU node. As a result, inference workloads for 70B+, 120B+ parameter models, or pipelines with large context windows, require multi-node, distributed GPU deployments.

Claudio Masolo
on Dec 04, 2025
Cloud

Arm Launches AI-Powered Copilot Assistant to Migrate Workflows to Arm Cloud Compute

At the recent GitHub Universe 2025 developer conference, Arm unveiled the Cloud migration assistant custom agent, a tool designed to help developers automate, optimize, and accelerate the migration of their x86 cloud workflows to Arm infrastructure.

Sergio De Simone
on Dec 03, 2025
AI, ML & Data Engineering

Memori Expands into a Full-Scale Memory Layer for AI Agents across SQL and MongoDB

Memori is an innovative, open-source memory system that empowers AI agents with structured, long-term memory using standard databases like SQL and MongoDB. It seamlessly integrates into existing frameworks, enabling efficient data extraction and retrieval without vendor lock-in. Ideal for developers, Memori's modular design ensures reliability and scalability for next-gen intelligent systems.

Robert Krzaczyński
on Dec 03, 2025
Mobile

Google's New LiteRT Accelerator Supercharges AI Workloads on Snapdragon-powered Android Devices

Google has introduced a new accelerator for LiteRT, called Qualcomm AI Engine Direct (QNN), to enhance on-device AI performance on Qualcomm-powered Android devices equipped with Snapdragon 8 SoCs. The accelerator delivers significant gains, offering up to a 100x speedup over CPU execution and 10x over GPU.

Sergio De Simone
on Nov 30, 2025
AI, ML & Data Engineering

Google Launches Agent Development Kit for Go

Google has added support for the Go language to its Agent Development Kit (ADK), enabling Go developers to build and manage agents in an idiomatic way that leverages the language's strong concurrency and typing features.

Sergio De Simone
on Nov 25, 2025
AI, ML & Data Engineering

Google Brings Colab Integration to Visual Studio Code

Google has announced the availability of a new Visual Studio Code extension that connects local notebooks to a Colab runtime. This allows developers to unify their previously separate local development setup and web-based Colab environment.

Sergio De Simone
on Nov 24, 2025
AI, ML & Data Engineering

AnyLanguageModel: Unified API for Local and Cloud LLMs on Apple Platforms

Developers on Apple platforms often face a fragmented ecosystem when using language models. Local models via Core ML or MLX offer privacy and offline capabilities, while cloud services like OpenAI, Anthropic, or Google Gemini provide advanced features. AnyLanguageModel, a new Swift package, simplifies integration by offering a unified API for both local and remote models.

Robert Krzaczyński
on Nov 24, 2025
AI, ML & Data Engineering

Olmo 3 Release Provides Full Transparency into Model Development and Training

The Allen Institute for AI has unveiled Olmo 3, an open-source language model family that empowers developers with full access to the model lifecycle, from training datasets to checkpoints. Featuring reasoning-focused variants and robust tools for post-training modifications, Olmo 3 promotes transparency, experimentation, and community collaboration, driving innovations in AI.

Robert Krzaczyński
on Nov 22, 2025
Architecture & Design

AI-Generated Code Creates New Wave of Technical Debt, Report Finds

AI-generated code is “highly functional but systematically lacking in architectural judgment”, a new report from Ox Security has found. In a report released in late October called Army of Juniors: The AI Code Security Crisis, AI application security (AppSec) company Ox Security outlined 10 architecture and security anti-patterns that are commonly found in AI-generated code.

Patrick Farry
on Nov 18, 2025
AI, ML & Data Engineering

Code Arena Launches as a New Benchmark for Real-World AI Coding Performance

LMArena has launched Code Arena, a new evaluation platform that measures AI models' performance in building complete applications instead of just generating code snippets. It emphasizes agentic behavior, allowing models to plan, scaffold, iterate, and refine code within controlled environments that replicate actual development workflows.

Robert Krzaczyński
on Nov 17, 2025
AI, ML & Data Engineering

Anthropic Adds Sandboxing and Web Access to Claude Code for Safer AI-Powered Coding

Anthropic released sandboxing capabilities for Claude Code and launched a web-based version of the tool that runs in isolated cloud environments. The company introduced these features to address security risks that arise when Claude Code writes, tests, and debugs code with broad access to developer codebases and files.

Vinod Goje
on Nov 14, 2025
AI, ML & Data Engineering

Google Unveils Project Suncatcher, Envisioning AI Models Running in Space

Google has unveiled Project Suncatcher, a research initiative exploring how solar powered satellite constellations equipped with Tensor Processing Units TPUs could one day enable large scale artificial intelligence computation in space.

Daniel Dominguez
on Nov 14, 2025
AI, ML & Data Engineering

New Claude Haiku 4.5 Model Promises Faster Performance at One-Third the Cost

Anthropic released Claude Haiku 4.5, making the model available to all users as its latest entry in the small, fast model category. The company positions the new model as delivering performance levels comparable to Claude Sonnet 4, which launched five months ago as a state-of-the-art model, but at "one-third the cost and more than twice the speed."

Vinod Goje
on Nov 12, 2025
AI, ML & Data Engineering

Anthropic Finds LLMs Can Be Poisoned Using Small Number of Documents

Anthropic's Alignment Science team released a study on poisoning attacks on LLM training. The experiments covered a range of model sizes and datasets, and found that only 250 malicious examples in pre-training data were needed to create a "backdoor" vulnerability. Anthropic concludes that these attacks actually become easier as models scale up.

Anthony Alford
on Nov 11, 2025
AI, ML & Data Engineering

CodeClash Benchmarks LLMs through Multi-Round Coding Competitions

Researchers from Standford, Princeton, and Cornell have developed a new benchmark to better evaluate coding abilities of large language models (LLMs). Called CodeClash, the new benchmark pits LLMs against each other in multi-round tournaments to assess their capacity to achieve competitive, high-level objectives beyond narrowly defined, task-specific problems.

Sergio De Simone
on Nov 10, 2025

Newer News

Older News

InfoQ Software Architects' Newsletter

News