Google's SEED RL Achieves 80x Speedup of Reinforcement-Learning

Researchers at Google Brain recently open-sourced their Scalable, Efficient Deep-RL (SEED RL) algorithm for AI reinforcement-learning. SEED RL is a distributed architecture that achieves state-of-the-art results on several RL benchmarks at lower cost and up to 80x faster than previous systems.

The team published a description of the SEED RL architecture and the results of several experiments in a paper accepted at the 2020 International Conference on Learning Representations (ICLR). The work addresses several drawbacks of existing distributed reinforcement-learning systems by moving neural-network inference to a central learner server, which can take advantage of GPU or TPU hardware accelerators. In benchmarks on DeepMind Lab environments, SEED RL achieved a frame-rate of 2.4 million frames per second using 64 Cloud TPU cores---a rate 80x faster than the previous state-of-the-art system. In a blog post summarizing the work, lead author Lasse Espeholt says,

We believe SEED RL and the results presented demonstrate that reinforcement learning has once again caught up with the rest of the deep learning field in terms of taking advantage of accelerators.

Reinforcement learning (RL) is a branch of AI used to create systems that need to make action decisions---such as choosing which move to make in a game---as opposed to other systems that simply transform input data---for example an NLP system that translates text from English to French. RL systems have the advantage that they do not need hand-labelled datasets as training input; instead the learning system interacts directly with the target environment, for example, by playing hundreds or thousands of games. Deep RL systems incorporate a neural-network, and in many cases can beat the best human players at a wide range of games, including Starcraft and Go.

As with any other deep-learning systems, deep-RL AIs can be expensive and time-consuming to train. Current state-of-the-art efforts speed up the process by decomposing the system into a centralized learner and multiple actors. The actors and the learner all have a copy of the same neural network. The actors interact with the environment; in the case of a game-playing AI, the actors play the game by sensing the state of the game and executing the next action, which is chosen by the actor's neural network. Actors send their experience---the data they sensed from the game, the actions they chose, and the result of that action---back to the learner, which updates the parameters of the shared neural network. The actors periodically refresh their copy of the network from the learner's latest version. The rate at which actors interact with the environment is called the frame rate, and it is a good measure of how quickly the system can be trained.

There are several drawbacks to this architecture. In particular, maintaining a copy of the neural-network on the actors introduces a communication bottleneck, and using the actors' CPU for network inference is a compute bottleneck. The SEED RL architecture uses the centralized learner for both network training and inference. This eliminates the need to send neural-network parameters to the actors, and the learner can use hardware accelerators such as GPUs and TPUs to improve both learning and inference performance. Because the actors no longer need to use their resources for inference, they can run the problem environment at a higher frame rate. The system was benchmarked on Google Research Football environment, the Arcade Learning Environment, and DeepMind Lab environment. On the DeepMind Lab environment, SEED RL achieved a frame rate of 2.4 million frames per second 64 Cloud TPU cores, a speedup of 80x, while also reducing cost by 4x. The system was also able to solve a previously unsolved task ("Hard") in the Google Research Football environment.

Google Brain was founded as a Google X research collaboration between Google Fellow Jeff Dean and Stanford University professor Andrew Ng. In 2013, deep-learning pioneer Geoff Hinton joined the team. Much of the Google Brain's research has been in natural-language processing (NLP) and perception tasks, whereas RL has typically been the focus of DeepMind, the RL startup acquired by Google in 2014, which developed the AlphaGo AI that defeated one of the best human Go players.

The source code for SEED RL is available on GitHub.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

InfoQ Article Contest

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter