PyTorch 1.6 Released; Microsoft Takes over Windows Version

PyTorch, Facebook's open-source deep-learning framework, announced the release of version 1.6 which includes new APIs and performance improvements. Along with the release, Microsoft announced it will take over development and maintenance of the Windows version of the framework.

In a recent blog post, the PyTorch team highlighted the major features of the release. Native automatic mixed precision (AMP) training can produce memory savings of up to 50% in Tensor Core GPUs and a beta-version of new memory profiler assists in debugging memory issues. The two distributed training paradigms supported by PyTorch, data-parallel and model-parallel, can now be used together, and the RPC framework includes a new backend that supports TensorPipe. In a separate blog post, a team from Microsoft announced Microsoft's expanded role in the PyTorch community, which includes ownership of the PyTorch for Windows build, improvements to test coverage, and improved documentation. According to the team,

Microsoft is happy to bring its Windows expertise to the table and bring PyTorch on Windows to its best possible self.

A Windows version of PyTorch was requested in January, 2017, and was included in the 0.3.0 release in December that year. As with many open-source projects, much of the development work for Windows support was contributed by the community, notably Jiachen Pu. However, the limited resources available often meant that test coverage was lacking, tutorials were out of date, and some features such as distributed training simply weren't available. Microsoft's efforts have brought test coverage "up to par" with the Linux version of the core PyTorch libraries and the domain libraries TorchText, TorchAudio, and TorchVision. The next areas of work will include support for distributed training and installation via pip.

PyTorch supports two types of distributed training: data-parallel, in which full replicas of a model are trained on many machines, each with a partition of the training data, and model-parallel, in which different parts of the model are trained on different machines. Model-parallel training is useful for extremely large models that cannot fit into the memory of a single machine. Data-parallel training speeds up the entire training process by distributing the work across many machines; combining it with model-parallel training can result in even faster training. In version 1.4, PyTorch introduced a distributed remote procedure call (RPC) system that supports model-parallel training, but until the latest 1.6 release, the two types of distributed training could not be used together. The new release also updates the RPC systems to include support for TensorPipe, a "tensor-aware point-to-point communication primitive" that improves data transfer between the machines in the distributed-training cluster.

Mixed precision training is a technique that uses different numbers of bits to represent different values during training; for example, some layers or operations might require a full 32-bit floating point number, while others may need only 16 bits. PyTorch 1.6 supports automatic mixed precision (AMP) training, which automatically selects the appropriate precision for each value. When used on Tensor Core GPUs, this can result in memory savings up to 50%. Also included in the release is a beta addition to the Autograd profiler, which can report both CPU and GPU memory usage inside the model.

In a discussion on Reddit, one user noted:

The new profiling tools look like something I've wanted for quite some time - one of the most annoying parts about PyTorch is the mysterious memory leaks.

The PyTorch source code and version 1.6 release notes are available on GitHub.

InfoQ Software Architects' Newsletter

Follow us on

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter