In its recent Accelerated Data Center Premiere Keynote, AMD unveiled its MI200 accelerator series Instinct MI250x and slightly lower-end Instinct MI250 GPUs.
To date, they are AMD’s highest performing server accelerators, outperforming the previous Instinct MI100 and competing with Nvidia’s latest Ampere series GPUs (e.g. A100). Designed with CDNA-2 architecture and TSMC’s 6nm FinFET lithography, the high-end MI250X provides 47.9 TFLOPs peak double precision (FP64) performance and 128GB of HBM2e memory that will allow training larger deep networks by minimizing model sharding. The technical details for the specifications can also be found on the official page.
Fig 1: AMD ROCm 5.0 deep learning and HPC stack components. More information can be reached in the ROCm Learning Center.
AMD is known for its support for open-source parallelization libraries. At a low level, the AMD ROCm (aka Radeon Open Compute) 4.5 release enabled parallelization via OpenCL, OpenMP, and HIP (aka Heterogeneous-Computing Interface for Portability) for CUDA-compatible programming. In the 5.0 release (Fig-1), AMD plans to extend its ROCm support for the new MI200 series GPUs and brings additional optimizations. At a high level, AMD supports ONNX, PyTorch, TensorFlow, MXNet, and CuPy in its platforms, allowing the portability of machine-learning code. Considering the floating-point processing speed provided by the new MI200 series, one may expect contributors to rush to port remaining APIs to these libraries.
AMD made significant improvements to its microservices support by establishing the Infinity Hub that offers performance-tuned containers for its accelerators. In addition to the official PyTorch and TensorFlow container images, Infinity Hub contains tools for applications requiring high-performance computing and parallelization on AMD hardware.
For visual computing applications, AMD maintains a separate MIVisionX project within the GPUOpen graphics ecosystem with platform-optimized encoding, decoding, and processing modules such as OpenCV. The company also offers a machine learning primitive library MIOpen with general matrix multiplication and linear algebra APIs that can be used for close-to-metal development using OpenCL and HIP compilers.
The MI200 GPUs are expected to reach the cloud platforms by 2022. For more detailed information, AMD CDNA-2 architecture white paper can be consulted.