BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Google OpenRL is an Experimental Self-hosted API for LLM Post-Training Fine-tuning

Google OpenRL is an Experimental Self-hosted API for LLM Post-Training Fine-tuning

Listen to this article -  0:00

Google's GKE Labs has introduced OpenRL, an open-source project that provides a self-hosted API for post-training and fine-tuning Large Language Models (LLMs) on standard Kubernetes clusters.

OpenRL abstracts reinforcement learning (RL) infrastructure from AI research, allowing machine learning teams to scale post-training workflows right on their own cluster, says Google.

According to Google engineers, when working with agentic reinforcement learning on LLMs, "it is incredibly easy to get bogged down in system complexity". Even a single RL loop requires juggling many moving parts: data preparation and cleaning, environment selection, training loop debugging, reward design, handling inference inconsistencies, provisioning hardware, and managing the underlying infrastructure.

Each of these is a hard problem. But what makes it more complex is how tightly AI research and infrastructure concerns are mixed together in today's tooling and frameworks.

By decoupling infrastructure from AI research, Google engineers argue that these challenges become more manageable, allowing specialized teams to focus on their domains, similarly to how Kubernetes enables infrastructure abstraction and simplifies workflows for application developers and reliability engineers.

One of the ways in which OpenRL makes post-training fine-tuning more efficient is by running multiple RL jobs on your infrastructure so you can increase overall GPU utilization. According to Google researchers, traditional RL loops are strictly sequential, which often leaves GPUs idle while waiting on CPU- or network-bound tasks to finish, especially for reward calculation.

Additionally, Google notes that OpenRL improves the user experience by clearly separating responsibilities: researchers can focus on developing the RL loop, while engineers handle executing and scaling the post-training fine-tuning workflows.

When you are doing R&D, you do not have to run the RL loop directly on the machines with GPUs, you can simply run your RL loop on your Mac pointing to the training APIs running on a Kubernetes cluster/VMs.

The OpenRL repository also includes an autoresearch recipe demonstrating how to run parallel experiments for parameter sweep and refine the reward signal in a text-to-sql workflow for Gemma models. Beyond its practical use, Google highlights it as an example of how automation can streamline and scale AI research.

OpenRL can be used easily on macOS, Nvidia GPUs, and GKE. It also integrates with Tinker-Cookbook thanks to its Tinker-compatible endpoint.

OpenRL is not the only effort focused on simplifying post-training fine-tuning through better separation of concerns. For example, FeynRL ensures separation of fine-tuning recipe and system logic, making it easier for researchers to develop and test new methods while still enabling those approaches to scale using tools like DeepSpeed, Ray, and vLLM.

About the Author

Rate this Article

Adoption
Style

BT