Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News PyTorch 1.10 Release Includes CUDA Graphs APIs, Compiler Improvements, and Android NNAPI Support

PyTorch 1.10 Release Includes CUDA Graphs APIs, Compiler Improvements, and Android NNAPI Support

This item in japanese

PyTorch, Facebook's open-source deep-learning framework, announced the release of version 1.10 which includes an integration with CUDA Graphs APIs and JIT compiler updates to increase CPU performance, as well as beta support for the Android Neural Networks API (NNAPI). New versions of domain-specific libraries TorchVision and TorchAudio were also released.

The PyTorch team highlighted the major features of the release in a recent blog post. The new release moves several distributed training features from beta status to stable, as well as the FX module and the torch.special module. The release includes several updates for improved CPU performance, including an integration with CUDA Graphs APIs to reduce CPU overheads and an LLVM-based JIT compiler that can fuse multiple operations. Support for Android NNAPI has moved from prototype to stable, including the ability to run models on a host for test purposes. The release also includes TorchX, a new SDK for faster production deployment of deep-learning applications. Overall, version 1.10 contains more than 3,400 commits from 426 contributors since the 1.9 release.

Prototype support for Android NNAPI, which allows Android apps to use hardware accelerators such as GPUs and Neural Processing Units (NPUs), was added last year. The new release moves the feature to beta and adds several capabilities, including coverage of more operations, load-time flexible tensor shapes, and the ability to test a model on a mobile host. Another beta feature in the release is the CUDA Graphs APIs integration. CUDA Graphs improves runtime performance for CPU-bound workloads by capturing and replaying a stream of work sent to a GPU; this trades off the flexibility of dynamic execution in exchange for skipping setup and dispatch of work, reducing the overhead on the CPU.

This release moves several features of distributed training from beta to stable, including: the Remote module, which provides transparent RPC for subclasses of nn.Module; DDP Communication Hook, which allows overriding how distributed data parallel communicates gradients across processes; and ZeroRedundancyOptimizer, which reduces the amount of memory required during training. The torch.special module, which provides APIs and functions similar to the SciPy special module, also moves to stable.

The FX module, a "Pythonic platform for transforming and lowering PyTorch programs," has moved from beta to stable. Its three main components are a symbolic tracer, an intermediate representation, and a Python code generator. These components allow developers to convert a Module-subclass to a Graph representation, modify the Graph in code, then convert the new Graph to Python source code that is automatically compatible with the existing PyTorch eager-execution system. The goal of FX is to allow developers to write custom transformations of their own custom code; for example, to perform operator fusion or to insert instrumentation. The latest release moves the module from beta to stable status.

In a discussion about the release on Hacker News, one user speculated that features such as FX indicate that PyTorch is becoming more like JAX. Horace He, a PyTorch developer, replied:

FX is more of a toolkit for writing transforms over your FX modules than "moving in JAX's direction" (although there are certainly some similarities!) It's not totally clear what "JAX's direction" means to you, but I'd consider its defining characteristics as 1. composable transformations, and 2. a functional way of programming (related to its function transformations). I'd say that Pytorch is moving towards the first but not the second.

JAX was open-sourced by Google in 2019 and is described as a library for "high-performance machine learning research." Google's main deep-learning framework, TensorFlow, is the chief rival of PyTorch, and released version 2.6 earlier this year.

The PyTorch 1.10 release notes and code are available on GitHub.

Rate this Article