InfoQ Homepage GPU Content on InfoQ

News

RSS Feed

Newer Older

Cloud

AWS Hikes EC2 Capacity Block Rates by 15% in Uniform ML Pricing Adjustment

AWS has raised EC2 Capacity Block prices for ML by 15% across all regions, impacting GPU-based workloads. The uniform price hikes affect top-tier instances powered by NVIDIA GPUs, underscoring supply chain pressures and inflation. With limited alternatives, organizations face higher costs, emphasizing the need for effective workload optimization and cost management strategies.

Steef-Jan Wiggers
on Jan 15, 2026
Java

TornadoVM 2.0 Brings Automatic GPU Acceleration and LLM Support to Java

The TornadoVM project recently reached version 2.0, a major milestone for the open-source project that aims to provide a heterogeneous hardware runtime for Java. The project automatically accelerates Java programs on multi-core CPUs, GPUs, and FPGAs. This release is likely to be of particular interest to teams developing LLM solutions on the JVM.

Ben Evans
on Dec 17, 2025
DevOps

NVIDIA Dynamo Addresses Multi-Node LLM Inference Challenges

Serving Large Language Models (LLMs) at scale is complex. Modern LLMs now exceed the memory and compute capacity of a single GPU or even a single multi-GPU node. As a result, inference workloads for 70B+, 120B+ parameter models, or pipelines with large context windows, require multi-node, distributed GPU deployments.

Claudio Masolo
on Dec 04, 2025
AI, ML & Data Engineering

SAM 3 Introduces a More Capable Segmentation Architecture for Modern Vision Workflows

Meta has released SAM 3, the latest version of its Segment Anything Model and the most substantial update to the project since its initial launch. Built to provide more stable and context-aware segmentation, the model offers improvements in accuracy, boundary quality, and robustness to real-world scenes, aiming to make segmentation more reliable across research and production systems.

Robert Krzaczyński
on Nov 26, 2025
Cloud

IBM Cloud Code Engine Serverless Fleets with GPUs for High-Performance AI and Parallel Computing

IBM Cloud Code Engine’s new Serverless Fleets revolutionizes how enterprises tackle compute-intensive tasks. Harnessing integrated GPU support, it simplifies the execution of large-scale workloads with a fully managed, pay-as-you-go model. This efficient platform eliminates operational complexities, enabling developers to focus on innovation while ensuring cost-effectiveness and scalability.

Steef-Jan Wiggers
on Oct 16, 2025
AI, ML & Data Engineering

Nvidia's GB200 NVL72 Supercomputer Achieves 2.7× Faster Inference on DeepSeek V3

In collaboration with NVIDIA, researchers from SGLang have published early benchmarks of the GB200 (Grace Blackwell) NVL72 system, showing up to a 2.7× increase in LLM inference throughput compared to the H100 on the DeepSeek-V3 671B model.

Matt Foster
on Jun 29, 2025
AI, ML & Data Engineering

Microsoft Native 1-Bit LLM Could Bring Efficient genAI to Everyday CPUs

In a recent paper, Microsoft researchers described BitNet b1.58 2B4T, the first LLM to be natively trained using "1-bit" (technically, 1-trit) weights, rather than being quantized from a model trained with floating point weights. According to Microsoft, the model delivers performance comparable to full-precision LLMs of similar size at a fraction of the computation cost and hardware requirements.

Sergio De Simone
on Apr 23, 2025
Architecture & Design

AMD’s Gaia Framework Brings Local LLM Inference to Consumer Hardware

AMD has released Gaia, an open-source project allowing developers to run large language models (LLMs) locally on Windows machines with AMD hardware acceleration. The framework supports retrieval-augmented generation (RAG) and includes tools for indexing local data sources. Gaia is designed to offer an alternative to LLMs hosted on a cloud service provider (CSP).

Matt Foster
on Apr 08, 2025
Cloud

Azure Container Apps Serverless GPUs Reach General Availability with NVIDIA NIM Support

Azure has launched Serverless GPUs for Azure Container Apps, enabling scalable, on-demand execution of AI workloads using NVIDIA A100 and T4 GPUs. This groundbreaking feature supports NVIDIA NIM microservices, simplifying deployment and management while optimizing costs. Developers can focus on applications, as Azure manages infrastructure, offering a flexible solution for diverse AI scenarios.

Steef-Jan Wiggers
on Apr 01, 2025
AI, ML & Data Engineering

Hugging Face Publishes Guide on Efficient LLM Training across GPUs

Hugging Face has published the Ultra-Scale Playbook: Training LLMs on GPU Clusters, an open-source guide that provides a detailed exploration of the methodologies and technologies involved in training LLMs across GPU clusters.

Daniel Dominguez
on Mar 04, 2025
AI, ML & Data Engineering

Nvidia Announces Arm-Powered Project Digits, Its First Personal AI Computer

Capable of running 200B-parameter models, Nvidia Project Digits packs the new Nvidia GB10 Grace Blackwell chip to allow developers to fine-tune and run AI models on their local machines. Starting at $3,000, Project Digits targets AI researchers, data scientists, and students to allow them to create their models using a desktop system and then deploy them on cloud or data center infrastructure.

Sergio De Simone
on Jan 13, 2025
Cloud

Microsoft Introduces Serverless GPUs on Azure Container Apps in Public Preview

Discover the power of Azure Container Apps with serverless GPUs, now in public preview! Leverage NVIDIA A100 and T4 GPUs for real-time AI inferencing and machine learning, all without infrastructure management. Enjoy scale-to-zero capabilities and per-second billing, optimizing both performance and costs. Unlock innovation with seamless Azure integration!

Steef-Jan Wiggers
on Dec 31, 2024
AI, ML & Data Engineering

NVIDIA Unveils Jetson Orin Nano Generative AI Supercomputer

NVIDIA has released the Jetson Orin Nano Super Developer Kit, a compact generative AI supercomputer. The device, which measures small enough to fit in one's hand, provides increased performance for generative AI capabilities.

Daniel Dominguez
on Dec 20, 2024
AI, ML & Data Engineering

QCon SF 2024: Scale Batch GPU Inference with Ray

At QConSF 2024, Cody Yu presented how Anyscale’s Ray can more effectively handle scaling out batch inference. Some of the problems Ray can assist with include scaling large datasets (hundreds of GBs or more), ensuring reliability with spot and on-demand instances, managing multi-stage heterogeneous compute, and managing tradeoffs with cost and latency.

Andrew Hoblitzell
on Nov 22, 2024
Cloud

AWS Announces General Availability of EC2 P5e Instances, Powered by NVIDIA H100 Tensor Core GPUs

Amazon Web Services (AWS) has launched EC2 P5e instances featuring NVIDIA H100 Tensor Core GPUs, substantially boosting AI and HPC performance. With enhanced memory bandwidth, these instances reduce latency for real-time applications. Ideal for tasks like LLM training and simulations, they offer improved scalability and cost-efficiency, making them pivotal for modern cloud computing.

Steef-Jan Wiggers
on Sep 18, 2024

Newer News

Older News

InfoQ Software Architects' Newsletter

News