Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News A Modern Compute Stack for Scaling Large AI, ML, & LLM Workloads at QCon SF

A Modern Compute Stack for Scaling Large AI, ML, & LLM Workloads at QCon SF

At QCon San Francisco, Jules Damji, a lead developer advocate at Anyscale and co-author of Learning Spark (2nd Edition), discussed the difficulties data scientists encounter when scaling and managing infrastructure for machine learning models. He emphasized the necessity for a framework that can support the latest machine learning libraries, is easily manageable, and can scale to accommodate large datasets and models. Damji introduced Ray, a general-purpose framework, as a potential solution.

Ray for Python

Damji asserted that Ray enables the scaling of any Python application and includes libraries for data ingestion, training, serving, and more.  It aims to assist data scientists in addressing the challenges of current machine learning models, including the complexities of scaling and managing infrastructure.

Damji provided a detailed explanation of Ray, discussing its core abstractions of tasks, actors, and objects. He also talked about the various libraries that come with Ray, each corresponding to a specific function or workload in the machine learning or data space. These libraries, he noted, work together, enabling users to construct a complete machine learning pipeline with Ray.

Ray Core

Ray tasks are asynchronous functions that can be executed on separate Python workers in the Ray framework. They can be invoked using the `remote` method and their results can be retrieved using `ray.get`. Ray tasks allow for parallel computation, support multiple returns, can have resource requirements specified, and provide options for fault tolerance and scheduling. More advanced usage of Ray tasks include nested remote functions and generators.

Ray actors are stateful workers or services that extend the Ray API from functions to classes. Each actor runs in its own Python process and methods of the actor are scheduled on that specific worker, allowing them to access and mutate the state of the worker. Actors can be interacted with by calling their methods with the remote operator, and methods called on the same actor are executed serially, sharing state with one another. Named actors allow you to retrieve the actor from any job in the Ray cluster.

Ray objects, also known as remote objects, are computed and created by tasks and actors in the Ray cluster. They are referenced using object refs, which are essentially pointers or unique IDs that can refer to a remote object without revealing its value. These objects are immutable, can be created by remote function calls or by ray.put(), and can be fetched using the ray.get() method. Objects are spilled to Ray’s temporary directory in the local filesystem by default.

Usage of Ray

The takeaway from that particular talk was that today, people actually want the simplicity of yesterday and the blessings of scale of today. And that's what I covered... Scaling is just like is as normal as breathing air. You need scale if you build large-scale applications, especially today with ML.

Ray is currently being used in production by several companies and has a large user community. Damji pointed out Ray's compatibility with other machine learning frameworks, such as PyTorch and TensorFlow, and its potential use in fine-tuning large language models.

The current version of Ray, 2.7, release brings updates and improvements to the Ray libraries and KubeRay for Kubernetes, including stability enhancements. One new feature is RayLLM, which allows for the serving of open-source LLMs with Ray Serve. Other updates include simplified APIs in Ray Train, stabilization of Ray Serve and KubeRay, and the introduction of initial Accelerator support for specific hardware configurations.  In addition, Ray 2.7 provides configurations for popular open-source models through Ray Serve, aiming to streamline the path to production.

In a demonstration, Damji illustrated how Ray can train a model across multiple GPUs.  He used a notebook as an example, running a model on a dataset and demonstrating how Ray can train the model across many GPUs.  This demonstration highlighted Ray's ability to handle model parallelism and its integration with other machine learning frameworks.  He clarified that Ray is not intended to replace data processing frameworks but rather to serve as a tool for data ingestion for training.

Those interested in learning more were invited to become part of the Ray Community or the Ray #LLM slack channel.

About the Author

Rate this Article