InfoQ Homepage GPU Content on InfoQ
-
Nvidia's GB200 NVL72 Supercomputer Achieves 2.7× Faster Inference on DeepSeek V3
In collaboration with NVIDIA, researchers from SGLang have published early benchmarks of the GB200 (Grace Blackwell) NVL72 system, showing up to a 2.7× increase in LLM inference throughput compared to the H100 on the DeepSeek-V3 671B model.
-
Microsoft Native 1-Bit LLM Could Bring Efficient genAI to Everyday CPUs
In a recent paper, Microsoft researchers described BitNet b1.58 2B4T, the first LLM to be natively trained using "1-bit" (technically, 1-trit) weights, rather than being quantized from a model trained with floating point weights. According to Microsoft, the model delivers performance comparable to full-precision LLMs of similar size at a fraction of the computation cost and hardware requirements.
-
AMD’s Gaia Framework Brings Local LLM Inference to Consumer Hardware
AMD has released Gaia, an open-source project allowing developers to run large language models (LLMs) locally on Windows machines with AMD hardware acceleration. The framework supports retrieval-augmented generation (RAG) and includes tools for indexing local data sources. Gaia is designed to offer an alternative to LLMs hosted on a cloud service provider (CSP).
-
Azure Container Apps Serverless GPUs Reach General Availability with NVIDIA NIM Support
Azure has launched Serverless GPUs for Azure Container Apps, enabling scalable, on-demand execution of AI workloads using NVIDIA A100 and T4 GPUs. This groundbreaking feature supports NVIDIA NIM microservices, simplifying deployment and management while optimizing costs. Developers can focus on applications, as Azure manages infrastructure, offering a flexible solution for diverse AI scenarios.
-
Hugging Face Publishes Guide on Efficient LLM Training across GPUs
Hugging Face has published the Ultra-Scale Playbook: Training LLMs on GPU Clusters, an open-source guide that provides a detailed exploration of the methodologies and technologies involved in training LLMs across GPU clusters.
-
Nvidia Announces Arm-Powered Project Digits, Its First Personal AI Computer
Capable of running 200B-parameter models, Nvidia Project Digits packs the new Nvidia GB10 Grace Blackwell chip to allow developers to fine-tune and run AI models on their local machines. Starting at $3,000, Project Digits targets AI researchers, data scientists, and students to allow them to create their models using a desktop system and then deploy them on cloud or data center infrastructure.
-
Microsoft Introduces Serverless GPUs on Azure Container Apps in Public Preview
Discover the power of Azure Container Apps with serverless GPUs, now in public preview! Leverage NVIDIA A100 and T4 GPUs for real-time AI inferencing and machine learning, all without infrastructure management. Enjoy scale-to-zero capabilities and per-second billing, optimizing both performance and costs. Unlock innovation with seamless Azure integration!
-
NVIDIA Unveils Jetson Orin Nano Generative AI Supercomputer
NVIDIA has released the Jetson Orin Nano Super Developer Kit, a compact generative AI supercomputer. The device, which measures small enough to fit in one's hand, provides increased performance for generative AI capabilities.
-
QCon SF 2024: Scale Batch GPU Inference with Ray
At QConSF 2024, Cody Yu presented how Anyscale’s Ray can more effectively handle scaling out batch inference. Some of the problems Ray can assist with include scaling large datasets (hundreds of GBs or more), ensuring reliability with spot and on-demand instances, managing multi-stage heterogeneous compute, and managing tradeoffs with cost and latency.
-
AWS Announces General Availability of EC2 P5e Instances, Powered by NVIDIA H100 Tensor Core GPUs
Amazon Web Services (AWS) has launched EC2 P5e instances featuring NVIDIA H100 Tensor Core GPUs, substantially boosting AI and HPC performance. With enhanced memory bandwidth, these instances reduce latency for real-time applications. Ideal for tasks like LLM training and simulations, they offer improved scalability and cost-efficiency, making them pivotal for modern cloud computing.
-
Microsoft Launches Open-Source Phi-3.5 Models for Advanced AI Development
Microsoft launched three new open-source AI models in its Phi-3.5 series: Phi-3.5-mini-instruct, Phi-3.5-MoE-instruct, and Phi-3.5-vision-instruct. Available under a permissive MIT license, these models offer developers powerful tools for various tasks, including reasoning, multilingual processing, and image and video analysis.
-
Meta's Research SuperCluster for Real-Time Voice Translation AI Systems
A recent article from Engineering at Meta reveals how the company is building Research SuperCluster (RSC) infrastructure that is used for advancements in real-time voice translations, language processing, computer vision, and augmented reality (AR).
-
NVIDIA Announces Next-Generation AI Superchip Blackwell
NVIDIA recently announced their next generation GPU architecture, Blackwell. Blackwell is the largest GPU ever built, with over 200 billion transistors, and can train large language models (LLMs) up to 4x faster than previous generation hardware.
-
Nvidia Announces Robotics-Oriented AI Foundational Model
At its recent GTC 2024 event, Nvidia announced a new foundational model to build intelligent humanoid robots. Dubbed GR00T, short for Generalist Robot 00 Technology, the model will understand natural language and be able to observe human actions and emulate human movements.
-
Meta Unveils 24k GPU AI Infrastructure Design
Meta recently announced the design of two new AI computing clusters, each containing 24,576 GPUs. The clusters are based on Meta's Grand Teton hardware platform, and one cluster is currently used by Meta for training their next-generation Llama 3 model.