InfoQ Homepage GPU Content on InfoQ
-
QCon SF 2024: Scale Batch GPU Inference with Ray
At QConSF 2024, Cody Yu presented how Anyscale’s Ray can more effectively handle scaling out batch inference. Some of the problems Ray can assist with include scaling large datasets (hundreds of GBs or more), ensuring reliability with spot and on-demand instances, managing multi-stage heterogeneous compute, and managing tradeoffs with cost and latency.
-
AWS Announces General Availability of EC2 P5e Instances, Powered by NVIDIA H100 Tensor Core GPUs
Amazon Web Services (AWS) has launched EC2 P5e instances featuring NVIDIA H100 Tensor Core GPUs, substantially boosting AI and HPC performance. With enhanced memory bandwidth, these instances reduce latency for real-time applications. Ideal for tasks like LLM training and simulations, they offer improved scalability and cost-efficiency, making them pivotal for modern cloud computing.
-
Microsoft Launches Open-Source Phi-3.5 Models for Advanced AI Development
Microsoft launched three new open-source AI models in its Phi-3.5 series: Phi-3.5-mini-instruct, Phi-3.5-MoE-instruct, and Phi-3.5-vision-instruct. Available under a permissive MIT license, these models offer developers powerful tools for various tasks, including reasoning, multilingual processing, and image and video analysis.
-
Meta's Research SuperCluster for Real-Time Voice Translation AI Systems
A recent article from Engineering at Meta reveals how the company is building Research SuperCluster (RSC) infrastructure that is used for advancements in real-time voice translations, language processing, computer vision, and augmented reality (AR).
-
NVIDIA Announces Next-Generation AI Superchip Blackwell
NVIDIA recently announced their next generation GPU architecture, Blackwell. Blackwell is the largest GPU ever built, with over 200 billion transistors, and can train large language models (LLMs) up to 4x faster than previous generation hardware.
-
Nvidia Announces Robotics-Oriented AI Foundational Model
At its recent GTC 2024 event, Nvidia announced a new foundational model to build intelligent humanoid robots. Dubbed GR00T, short for Generalist Robot 00 Technology, the model will understand natural language and be able to observe human actions and emulate human movements.
-
Meta Unveils 24k GPU AI Infrastructure Design
Meta recently announced the design of two new AI computing clusters, each containing 24,576 GPUs. The clusters are based on Meta's Grand Teton hardware platform, and one cluster is currently used by Meta for training their next-generation Llama 3 model.
-
NVIDIA Introduces Metropolis Microservices for Jetson to Run AI Apps at the Edge
NVIDIA has expanded its Nvidia Metropolis Microservices Cloud-based AI solution to run on the NVIDIA Jetson IoT embedded platform, including support for video streaming and AI-based perception.
-
LeftoverLocals May Leak LLM Responses on Apple, Qualcomm, and AMD GPUs
Security firm Trail of Bits disclosed a vulnerability allowing malicious actors to recover data from GPU local memory on Apple, Qualcomm, AMD, and Imagination GPUs. Dubbed LeftoverLocals, the vulnerability affects any application using the GPU, including Large Language Models (LLMs) and machine learning (ML) models.
-
AWS Unveils Gemini, a Distributed Training System for Swift Failure Recovery in Large Model Training
AWS and Rice University have introduced Gemini, a new distributed training system to redefine failure recovery in large-scale deep learning models. According to the research paper, Gemini adopts a daring strategy by utilizing CPU memory to ensure previously unheard-of speeds in failure recovery, overcoming obstacles related to high recovery costs and constrained checkpoint storage capacity.
-
Microsoft Releases DeepSpeed-FastGen for High-Throughput Text Generation
Microsoft has announced the alpha release of DeepSpeed-FastGen, a system designed to improve the deployment and serving of large language models (LLMs). DeepSpeed-FastGen is the synergistic composition of DeepSpeed-MII and DeepSpeed-Inference . DeepSpeed-FastGen is based on the Dynamic SplitFuse technique. The system currently supports several model architectures.
-
Python-Like Numerical Computation Library MatX Brings Transforms as Operators and Other Features
Developed by Nvidia for its own GPUs, MatX is a C++ library that aims to bring near-native performance in numerical computing using a high-level syntax not far from those available in Python scipy or MATLAB. Its latest release brings a number of new features, including the possibility to use transforms as operators, new operators such as upsample, downsample, pwelch, and more.
-
Google Cloud Ops Agent Can Now Monitor Nvidia GPUs
Google Cloud announced that Ops Agent, the agent for collecting telemetry from Compute Engine instances, can now collect and aggregate metrics from NVIDIA GPUs on VMs.
-
Azure Previews ND H100 V5 Virtual Machines to Accelerate Generative AI
Azure recently announced the preview of the ND H100 v5, virtual machines that integrate the latest Nvidia H100 Tensor Core GPUs and support Quantum-2 InfiniBand networking. According to Microsoft, the new option will offer AI developers improved performance and scaling across thousands of GPUs.
-
AWS and NVIDIA to Collaborate on Next-Gen EC2 P5 Instances for Accelerating Generative AI
AWS and NVIDIA announced the development of a highly scalable, on-demand AI infrastructure that is specifically designed for training large language models and creating advanced generative AI applications. The collaboration aims to create the most optimized and efficient system of its kind, capable of meeting the demands of increasingly complex AI tasks.