BT

InfoQ Homepage News DeepMind's AI Defeats Top StarCraft Players

DeepMind's AI Defeats Top StarCraft Players

This item in japanese

Bookmarks

DeepMind's AlphaStar AI program recently defeated two top professional StarCraft players 5-0.

DeepMind team wrote about their StarCraft II-playing AI program called AlphaStar. The AI program played two different highly-ranked professional players, defeating both 5 games to 0. Although researchers have been developing AI for playing StarCraft since 2009, in annual competitions against human players "[even] the strongest bots currently play at an amateur human level."

Teaching an AI program to play real-time strategy (RTS) games is challenging, for many reasons. First, unlike classic strategy games such as chess or Go, players cannot see the state of the entire game at any time. The effects of actions may not pay off for a long time, and players must act continuously in real time instead of making single moves in alternating turns. Also, the game's action space is much larger: instead of a handful of "pieces" that may make a well-defined set of legal moves, StarCraft games can contain dozens of buildings and hundreds of units, which can be grouped and controlled hierarchically.

In 2017 DeepMind blogged their partnership with Blizzard Entertainment, the makers of StarCraft, in developing AI for playing the game. DeepMind open-sourced PySC, a Python wrapper around Blizzard's StarCraft II API, as part of their research efforts. This latest announcement is an update on the results of their work.

AlphaStar uses a deep neural network to control its behavior; the inputs to the network are data from the game interface, and the outputs are commands to the game. Although the full technical details were not published, the blog post does say that the network consists of "a transformer torso to the units (similar to relational deep reinforcement learning), combined with a deep LSTM core, an auto-regressive policy head with a pointer network, and a centralised [sic] value baseline."

The network was first trained using supervised learning on publicly-available sample games between human players. Then copies of this network, or agents, were used to create a multi-agent "league." They played against each other, improving their game using reinforcement-learning (RL) techniques. Over time, agents were frozen, and new copies of them were added to the league for improvement by RL. In this way, the system can explore new strategies, by training new agents from copies of old ones, while "remembering" previously learned strategies by keeping the agents that learned them unmodified. To train the league, DeepMind built a distributed system that ran for 14 days on Google's v3 TPUs, using 16 TPUs per agent. The final agent used in competition consists of "the most effective mixture of strategies" of the agents in the league.

DeepMind is preparing a full description of the work to appear in a peer-reviewed journal.

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

  • A huge relief!

    by Cameron Purdy /

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Now I can have my computer play my games for me, so I can free up some time for work.

  • Can you translate this?

    by Chris Turner /

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Some # clarifications are important here. # of "agents"? At 16 TPUv3 per agent, + how many control VMs? How many hours of "experience" and how many kwh of energy built this seemingly fantastic AI model? Oh yeah, and the # of programmers who wrote the API and # of analysts who ran it, and # of sysadmins who built and maintained infrastructure.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT

Is your profile up-to-date? Please take a moment to review and update.

Note: If updating/changing your email, a validation request will be sent

Company name:
Company role:
Company size:
Country/Zone:
State/Province/Region:
You will be sent an email to validate the new email address. This pop-up will close itself in a few moments.