Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Berkeley Researchers Announce Robot Training Algorithm DayDreamer

Berkeley Researchers Announce Robot Training Algorithm DayDreamer


Researchers from University of California, Berkeley, recently announced DayDreamer, a reinforcement-learning (RL) AI algorithm that uses a world model, which allows it to learn more quickly without the need for interacting with a simulator. Using DayDreamer, the team was able to train several physical robots to perform complex tasks within only a few hours.

The model and several experiments were described in a paper published on arXiv. DayDreamer uses a collection of neural networks to learn a world model by interacting with the environment; the world model then allows the robot to "imagine" the results of a series of actions. The imagined behavior can be used with reinforcement learning to train a controller for the robot. Because the world model is learned from real-world interaction, there is no need for a simulated environment; however, because the world model supports RL on imagined behavior, the overall process is faster than RL on the physical robot alone. According to the Berkeley team:

Our aim is to push the limits of robot learning directly in the real world and offer a robust platform to enable future work that develops the benefits of world models for robot learning.

RL is a common technique for developing control systems for robots. However, training a pure RL controller using the physical robot in the real world could take a long time. On the other hand, training the controller using only a simulated environment might result in a controller that cannot handle the complexity and dynamics of the real world. It also requires the implementation of a simulated environment, which adds development time and cost.

DayDreamer combines the higher learning rate of simulation with the generality of learning from the real world, by using a neural network to learn a world model; in effect, the robot learns a simulator for its environment. The world model system uses an encoder neural network to map sensor data to a smaller-dimensional representation, and a dynamics network to predict changes to this representation given motor actions. A reward neural network learns the value of a state in the context of achieving some task.

DayDreamer Architecture

Image Source:

An RL actor-critic algorithm uses this world model to learn control behaviors. The world model allows the algorithm to explore many possible behaviors in the control space in parallel, whereas running RL directly on the hardware would mean that only one behavior could be performed at any given time. This allows for running RL at speeds comparable to a simulated environment.

The Berkeley team deployed the DayDreamer algorithm on four different robots, each with a different target task: an A1 Quadruped learned to walk, starting from a position lying on its back; a UR5 manipulator learned a pick-and-place task; an XArm manipulator learned a pick-and-place task; and a Sphero Ollie mobile robot learned a navigation task. The pick-and-place tasks took around 10 hours to learn, while the Sphero navigation task took only two hours. The quadruped required only one hour to learn to roll off its back and walk; previous research had required extensive simulation and transfer learning to accomplish this task.

Co-author Danijar Hafner tweeted about the work and answered several users' questions. In response to one user who was surprised at the lack of "hacks" required to control the robots, Hafner replied:

Haha, no hacks! The only thing that felt a bit complicated it is the reward function for walking, which contains multiple terms as described in the paper. But I learned robotics papers often use even more complicated ones.

The latest DayDreamer code has not yet been open-sourced, although according to co-author Alejandro Escontrela it will be soon. An older version of the Dreamer algorithm is available on GitHub.

About the Author

Rate this Article


Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p