InfoQ Homepage Large language models Content on InfoQ
-
Google Previews Gemini's Agent Mode in Android Studio Narwhal
Google has announced the integration of Gemini in Android Studio's Agent Mode into the latest canary release of Android Studio, Android Studio Narwhal preview. According to Google, the new Agent Mode is designed to handle multi-step development tasks that span across several files.
-
MiniMax Releases M1: a 456B Hybrid-Attention Model for Long-Context Reasoning and Software Tasks
MiniMax has introduced MiniMax-M1, a new open-weight reasoning model built to handle extended contexts and complex problem-solving with high efficiency. Built on top of the earlier MiniMax-Text-01, M1 features a hybrid Mixture-of-Experts (MoE) architecture and a novel “lightning attention” mechanism.
-
GPULlama3.java Brings GPU-Accelerated LLM Inference to Pure Java
The University of Manchester's Beehive Lab has released GPULlama3.java, marking the first Java-native implementation of Llama3 with automatic GPU acceleration. This project leverages TornadoVM to enable GPU-accelerated large language model inference without requiring developers to write CUDA or native code, potentially transforming how Java developers approach AI apps in enterprise environments.
-
Midjourney Debuts V1 AI Video Model
Midjourney has launched its first video generation V1 model, a web-based tool that allows users to animate still images into 5-second video clips.
-
Phoenix.new Launches Remote Agent-Powered Dev Environments for Elixir
Chris McCord has released Phoenix.new, a browser-native agent platform that gives large language models full-stack control over Elixir development environments. Designed to work entirely in the cloud, Phoenix.new spins up real Phoenix apps inside ephemeral VMs, allowing LLM agents to build, test, and iterate in real time.
-
AlphaWrite: Improving AI Narratives through Evolution
AlphaWrite is a new framework designed to enhance creative writing with structure and measurable improvements. Developed by Toby Simonds, it employs an evolutionary process to iteratively boost storytelling quality during inference.
-
OpenAI Launches o3-pro Model Focused on Reliability, Amid Mixed User Feedback
OpenAI launched o3-pro, a new version of its most advanced model aimed at delivering more reliable, thoughtful responses across complex tasks. Now available to Pro and Team users in ChatGPT and via API, o3-pro replaces the earlier o1-pro.
-
Mistral AI Releases Magistral, Its First Reasoning-Focused Language Model
Mistral AI has released Magistral, a new model family built for transparent, multi-step reasoning. Available in open and enterprise versions, it supports structured logic, multilingual output, and traceable decision-making.
-
Meta Introduces V-JEPA 2, a Video-Based World Model for Physical Reasoning
Meta has introduced V-JEPA 2, a new video-based world model designed to improve machine understanding, prediction, and planning in physical environments. The model extends the Joint Embedding Predictive Architecture (JEPA) framework and is trained to predict outcomes in embedding space using video data.
-
Anthropic Releases Claude Code SDK to Power AI-Paired Programming
Anthropic has launched Claude Code SDK, a new toolkit that extends the reach of its code assistant, Claude, far beyond the chat interface. Designed for integration into modern developer workflows, the SDK offers a suite of tools for TypeScript, Python, and the command line, enabling advanced automation of code review, refactoring, and transformation tasks.
-
Mistral Releases Its Own Coding Assistant Mistral Code
Mistral has introduced Mistral Code, a new AI-powered development tool aimed at improving the efficiency and accuracy of coding workflows. Mistral Code utilizes advanced AI models to offer developers intelligent code completion, real-time suggestions, and the capability to interact with the codebase using natural language.
-
QCon AI New York 2025: Program Committee Announced
Meet the QCon AI New York Program Committee, senior software leaders shaping a practical AI conference for engineers building at scale.
-
Google Cloud Run Now Offers Serverless GPUs for AI and Batch Processing
Google Cloud has launched NVIDIA GPU support for Cloud Run, enhancing its serverless platform with scalable, cost-efficient GPU resources. This upgrade enables rapid AI inference and batch processing, featuring pay-per-second billing and automatic scaling to zero. Developers can access seamless GPU support easily, making advanced AI applications faster and more accessible.
-
Anthropic Open-Sources Tool to Trace the "Thoughts" of Large Language Models
Anthropic researchers have open-sourced the tool they used to trace what goes on inside a large language model during inference. It includes a circuit tracing Python library that can be used with any open-weights model and a frontend hosted on Neuropedia to explore the library output through a graph.
-
Introducing ANS: DNS-Inspired Secure Discovery for AI Agents
The Open Worldwide Application Security Project (OWASP) has recently introduced a new standard for securely discovering AI agents. Inspired by DNS, the Agent Name Service (ANS) provides a protocol-agnostic registry mechanism that uses Public Key Infrastructure (PKI) to establish agent identity and trust.