Meta and AWS will work together to improve the performance for customers of applications running PyTorch on AWS and accelerate how developers build, train, deploy, and operate artificial intelligence and machine-learning models.
PyTorch is an open-source deep-learning framework that makes it easy to develop machine-learning models and deploy them to production. PyTorch also provides dynamic computation graphs and libraries for distributed training, which are tuned for high performance on AWS.
PyTorch on AWS is designed to utilize Amazon EC2 instances, Elastic Fabric Adapter, and other storage, network, and infrastructure technologies. In addition, Pytorch on AWS provides a rich ecosystem of tools and models, including torchvision, torchaudio, torchtext, torchelastic, torch_xla, and extends PyTorch.
TorchServe, PyTorch's model-serving library, is easy to use for both developers getting models ready for production and ops engineers deploying containers in production. TorchServe supports eager mode plus TorchScript, and comes with default handlers for the most commonly deployed models, to deploy with zero code changes. TorchServe can host multiple models simultaneously, and supports versioning. TorchServe features including multi-model serving, model versioning for A/B testing, metrics for monitoring, and RESTful endpoints for application integration. TorchServe supports any machine-learning environment, including Amazon SageMaker, Kubernetes, and Amazon EKS.
Meta is enabling PyTorch on AWS to orchestrate large-scale training jobs across a distributed system of AI accelerators. This will make it easier for developers to build large-scale deep-learning models for natural-language processing and computer vision.
The companies will work together to offer native tools to improve the performance, explainability, and cost of inference on PyTorch.
This partnership has broadened the existing relationship between Meta and AWS over the last five years. Currently, Meta uses AWS infrastructure and capabilities to complement its existing on-premises infrastructure, and will broaden its use of AWS compute, storage, databases, and security services to provide privacy, reliability, and scale in the cloud.