Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News DeepMind's AI Defeats Top StarCraft Players

DeepMind's AI Defeats Top StarCraft Players

This item in japanese

Lire ce contenu en français

DeepMind's AlphaStar AI program recently defeated two top professional StarCraft players 5-0.

DeepMind team wrote about their StarCraft II-playing AI program called AlphaStar. The AI program played two different highly-ranked professional players, defeating both 5 games to 0. Although researchers have been developing AI for playing StarCraft since 2009, in annual competitions against human players "[even] the strongest bots currently play at an amateur human level."

Teaching an AI program to play real-time strategy (RTS) games is challenging, for many reasons. First, unlike classic strategy games such as chess or Go, players cannot see the state of the entire game at any time. The effects of actions may not pay off for a long time, and players must act continuously in real time instead of making single moves in alternating turns. Also, the game's action space is much larger: instead of a handful of "pieces" that may make a well-defined set of legal moves, StarCraft games can contain dozens of buildings and hundreds of units, which can be grouped and controlled hierarchically.

In 2017 DeepMind blogged their partnership with Blizzard Entertainment, the makers of StarCraft, in developing AI for playing the game. DeepMind open-sourced PySC, a Python wrapper around Blizzard's StarCraft II API, as part of their research efforts. This latest announcement is an update on the results of their work.

AlphaStar uses a deep neural network to control its behavior; the inputs to the network are data from the game interface, and the outputs are commands to the game. Although the full technical details were not published, the blog post does say that the network consists of "a transformer torso to the units (similar to relational deep reinforcement learning), combined with a deep LSTM core, an auto-regressive policy head with a pointer network, and a centralised [sic] value baseline."

The network was first trained using supervised learning on publicly-available sample games between human players. Then copies of this network, or agents, were used to create a multi-agent "league." They played against each other, improving their game using reinforcement-learning (RL) techniques. Over time, agents were frozen, and new copies of them were added to the league for improvement by RL. In this way, the system can explore new strategies, by training new agents from copies of old ones, while "remembering" previously learned strategies by keeping the agents that learned them unmodified. To train the league, DeepMind built a distributed system that ran for 14 days on Google's v3 TPUs, using 16 TPUs per agent. The final agent used in competition consists of "the most effective mixture of strategies" of the agents in the league.

DeepMind is preparing a full description of the work to appear in a peer-reviewed journal.

Rate this Article