AlphaGo: Google and DeepMind Publish Seminal AI Work

A game simulation at Google's Deep Mind defeated expert humans at Go last month in a breakthrough for AI.

An article in Nature details the properties of Google’s AlphaGo, its implementation of two neural networks and how it achieved multiple victories 99% of the time against the strongest Go programs as well as defeating expert players like Fan Hui, making it the “first time that a computer program has defeated a human professional player in the full-sized game of Go”. Go is considered one of the great unsolved problems in AI.

AlphaGo achieved the high victory margins with a new algorithm that “combines Monte Carlo simulation with value and policy networks” across the two networks. The Monte-Carlo tree search is guided by a deep neural network and conducts supervised learning while playing against expert Go players. It then uses supervised reinforcement learning to further learn Go by playing against itself. It played against experts, and then reinforced those lessons by playing against itself thousands of times over. The policy network suggests intelligent moves and the value network evaluates the position reached after each play.

The approach reduced the search space the AlphaGo program had to compute compared to brute-force approaches like IBM’s Deep Blue (1994 Chess), making it fundamentally different and more human-like than previous Go simulations and addressing one of the major hurdles to doing successful Go AI. The total number of permutations in Go is on the order of 10^170, a googol times larger than the total number of permutations in chess, making a brute-force approach unfeasible.

The researchers and engineers then

trained the neural networks on 30 million moves from games played by human experts, until it could predict the human move 57 percent of the time (the previous record before AlphaGo was 44 percent)... [and] learned to discover new strategies for itself, by playing thousands of games between its neural networks, and adjusting the connections using reinforced learning.

Once the policy networks were trained they were “in turn used to train the value networks, again by reinforcement learning from games of self-play. These value networks can evaluate any Go position and estimate the eventual winner”.

The approach DeepMind implemented will potentially lead to applications outside of Go. Demis Hassabis at Google’s DeepMind noted:

Because the methods we’ve used are general-purpose, our hope is that one day they could be extended to help us address some of society’s toughest and most pressing problems, from climate modelling to complex disease analysis.

AlphaGo is slated to play against Lee Sedol later in March. When commenting on AlphaGo’s 5-0 defeat against Fan Hui, the Go expert Ke Jie stated

When I was looking at the games, I didn’t know which is human, which is machine, I can’t tell the difference.

Look for more on this topic after the match in March on InfoQ Data Science.

InfoQ Software Architects' Newsletter

Write for InfoQ

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter