InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Enter your e-mail address

Select your country

We protect your privacy.

InfoQ Homepage GPU Content on InfoQ

News

RSS Feed

Newer Older

Cloud

Azure Container Apps Serverless GPUs Reach General Availability with NVIDIA NIM Support

Azure has launched Serverless GPUs for Azure Container Apps, enabling scalable, on-demand execution of AI workloads using NVIDIA A100 and T4 GPUs. This groundbreaking feature supports NVIDIA NIM microservices, simplifying deployment and management while optimizing costs. Developers can focus on applications, as Azure manages infrastructure, offering a flexible solution for diverse AI scenarios.

Steef-Jan Wiggers
on Apr 01, 2025
AI, ML & Data Engineering

Hugging Face Publishes Guide on Efficient LLM Training across GPUs

Hugging Face has published the Ultra-Scale Playbook: Training LLMs on GPU Clusters, an open-source guide that provides a detailed exploration of the methodologies and technologies involved in training LLMs across GPU clusters.

Daniel Dominguez
on Mar 04, 2025
AI, ML & Data Engineering

Nvidia Announces Arm-Powered Project Digits, Its First Personal AI Computer

Capable of running 200B-parameter models, Nvidia Project Digits packs the new Nvidia GB10 Grace Blackwell chip to allow developers to fine-tune and run AI models on their local machines. Starting at $3,000, Project Digits targets AI researchers, data scientists, and students to allow them to create their models using a desktop system and then deploy them on cloud or data center infrastructure.

Sergio De Simone
on Jan 13, 2025
Cloud

Microsoft Introduces Serverless GPUs on Azure Container Apps in Public Preview

Discover the power of Azure Container Apps with serverless GPUs, now in public preview! Leverage NVIDIA A100 and T4 GPUs for real-time AI inferencing and machine learning, all without infrastructure management. Enjoy scale-to-zero capabilities and per-second billing, optimizing both performance and costs. Unlock innovation with seamless Azure integration!

Steef-Jan Wiggers
on Dec 31, 2024
AI, ML & Data Engineering

NVIDIA Unveils Jetson Orin Nano Generative AI Supercomputer

NVIDIA has released the Jetson Orin Nano Super Developer Kit, a compact generative AI supercomputer. The device, which measures small enough to fit in one's hand, provides increased performance for generative AI capabilities.

Daniel Dominguez
on Dec 20, 2024
AI, ML & Data Engineering

QCon SF 2024: Scale Batch GPU Inference with Ray

At QConSF 2024, Cody Yu presented how Anyscale’s Ray can more effectively handle scaling out batch inference. Some of the problems Ray can assist with include scaling large datasets (hundreds of GBs or more), ensuring reliability with spot and on-demand instances, managing multi-stage heterogeneous compute, and managing tradeoffs with cost and latency.

Andrew Hoblitzell
on Nov 22, 2024
Cloud

AWS Announces General Availability of EC2 P5e Instances, Powered by NVIDIA H100 Tensor Core GPUs

Amazon Web Services (AWS) has launched EC2 P5e instances featuring NVIDIA H100 Tensor Core GPUs, substantially boosting AI and HPC performance. With enhanced memory bandwidth, these instances reduce latency for real-time applications. Ideal for tasks like LLM training and simulations, they offer improved scalability and cost-efficiency, making them pivotal for modern cloud computing.

Steef-Jan Wiggers
on Sep 18, 2024
AI, ML & Data Engineering

Microsoft Launches Open-Source Phi-3.5 Models for Advanced AI Development

Microsoft launched three new open-source AI models in its Phi-3.5 series: Phi-3.5-mini-instruct, Phi-3.5-MoE-instruct, and Phi-3.5-vision-instruct. Available under a permissive MIT license, these models offer developers powerful tools for various tasks, including reasoning, multilingual processing, and image and video analysis.

Robert Krzaczyński
on Aug 31, 2024
AI, ML & Data Engineering

Meta's Research SuperCluster for Real-Time Voice Translation AI Systems

A recent article from Engineering at Meta reveals how the company is building Research SuperCluster (RSC) infrastructure that is used for advancements in real-time voice translations, language processing, computer vision, and augmented reality (AR).

Vinod Goje
on Aug 21, 2024
AI, ML & Data Engineering

NVIDIA Announces Next-Generation AI Superchip Blackwell

NVIDIA recently announced their next generation GPU architecture, Blackwell. Blackwell is the largest GPU ever built, with over 200 billion transistors, and can train large language models (LLMs) up to 4x faster than previous generation hardware.

Anthony Alford
on Apr 09, 2024
AI, ML & Data Engineering

Nvidia Announces Robotics-Oriented AI Foundational Model

At its recent GTC 2024 event, Nvidia announced a new foundational model to build intelligent humanoid robots. Dubbed GR00T, short for Generalist Robot 00 Technology, the model will understand natural language and be able to observe human actions and emulate human movements.

Sergio De Simone
on Apr 05, 2024
AI, ML & Data Engineering

Meta Unveils 24k GPU AI Infrastructure Design

Meta recently announced the design of two new AI computing clusters, each containing 24,576 GPUs. The clusters are based on Meta's Grand Teton hardware platform, and one cluster is currently used by Meta for training their next-generation Llama 3 model.

Anthony Alford
on Apr 02, 2024
AI, ML & Data Engineering

NVIDIA Introduces Metropolis Microservices for Jetson to Run AI Apps at the Edge

NVIDIA has expanded its Nvidia Metropolis Microservices Cloud-based AI solution to run on the NVIDIA Jetson IoT embedded platform, including support for video streaming and AI-based perception.

Sergio De Simone
on Feb 08, 2024
DevOps

LeftoverLocals May Leak LLM Responses on Apple, Qualcomm, and AMD GPUs

Security firm Trail of Bits disclosed a vulnerability allowing malicious actors to recover data from GPU local memory on Apple, Qualcomm, AMD, and Imagination GPUs. Dubbed LeftoverLocals, the vulnerability affects any application using the GPU, including Large Language Models (LLMs) and machine learning (ML) models.

Sergio De Simone
on Jan 25, 2024
AI, ML & Data Engineering

AWS Unveils Gemini, a Distributed Training System for Swift Failure Recovery in Large Model Training

AWS and Rice University have introduced Gemini, a new distributed training system to redefine failure recovery in large-scale deep learning models. According to the research paper, Gemini adopts a daring strategy by utilizing CPU memory to ensure previously unheard-of speeds in failure recovery, overcoming obstacles related to high recovery costs and constrained checkpoint storage capacity.

Daniel Dominguez
on Nov 10, 2023

Newer News

Older News

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

News