The heroes of Hikaru’s Go were off by 86 years.
As some of you might have heard, the word of go – or weiqi as it is known in its homeland of China – is currently undergoing its Deep Blue moment as one of the world’s strongest players Lee Sedol faces off against Google’s DeepMind AlphaGo project. Deep Blue was the IBM/Carnegie Mellon supercomputer that in 1997 beat the world’s top grandmaster Gary Kasparov in a series of 6 chess games. But the computer’s margin of victory at 3.5 to 2.5 was modest, and the event was dogged by Kasparov’s allegations that the IBM team had underhandedly helped the computer. It would be an entire decade before the top computer chess programs decisively overtook the top human players. As of today, there is a 563 point difference between the Elo rating of Magnus Carlsen, the current highest rated human player on the FIDE’s database, and the world’s most powerful chess program, the open source Stockfish 7. In practical terms, this means that Carlsen can expect to win fewer than one in a hundred games against the Stockfish running on a contemporary 64-bit quadcore CPU.
In terms of game complexity, more orders of magnitude separate go from chess than chess from draughts, a game that has been fully solved. The aim is to capture territory and enemy stones by encircling them while defending your own turf, both of which are tallied up at the end of the game with the winner being the one with the most points. It is played on a 19×19 board, a lot larger than the 8×8 arrangement of chess, and you can position your pieces – or stones – on any empty space not occupied by or completely encircled by the enemy, whereas the range of possible moves in chess is strongly constricted. Chess is tactics, go is logistics; chess is combined arms, go is encirclements; chess draws strongly upon algorithmic and combinatorial thinking, whereas go is more about pattern matching and “intuition.” Therefore it is not surprising that until recently it was common wisdom that it would be many decades before computers would start beating the world’s top human players. The unimpressive performance of existing go computer programs, and the slowdown or end of Moore’s Law in the past few years, would have only given weight to that pessimistic assessment. (Or perhaps optimistic one, if you’re with MIRI). Lee Sedol himself thought the main question would be whether he would beat AlphaGo by 5-0 or 4-1.
Which makes it all the more remarkable that Lee Sedol is not just behind but having lost all of his three games so far is getting positively rekt.
But apparently Lee’s confidence was more rational than hubris. He had watched AlphaGo playing against weaker players, in which it made some apparent mistakes. But as a DeepMind research scientist noted, this was actually feature, not bug:
As Graepel explained, AlphaGo does not attempt to maximize its points or its margin of victory. It tries to maximize its probability of winning. So, Graepel said, if AlphaGo must choose between a scenario where it will win by 20 points with 80 percent probability and another where it will win by 1 and a half points with 99 percent probability, it will choose the latter. Thus, late in Game One, the system made some moves that Redmond considered mistakes—“slow” in his terminology. These moves seemed to give up points, but from where Graepel was sitting, AlphaGo was merely trying to maximize its chances.
In other words, while the projected points on the board – territory held plus stones captured – might for a long time appear to be roughly equal, at the same time the probability of ultimate victory would inexorably shift against Lee Sedol. And capped as our human IQs are, not only Lee but all the rest of us might be simply incapable of discerning the deeper strategies in play: “And so we boldly go – into the whirling knives” (to borrow from Nick Bostrom’s book on the risks of computer superintelligence).
Those are in fact the exact terms in which AI scientist/existential risks researcher Eliezer Yudkowsky analyzed this game in a lengthy Facebook post:
At this point it seems likely that Sedol is actually far outclassed by a superhuman player. The suspicion is that since AlphaGo plays purely for *probability of long-term victory* rather than playing for points, the fight against Sedol generates boards that can falsely appear to a human to be balanced even as Sedol’s probability of victory diminishes. The 8p and 9p pros who analyzed games 1 and 2 and thought the flow of a seemingly Sedol-favoring game ‘eventually’ shifted to AlphaGo later, may simply have failed to read the board’s true state. The reality may be a slow, steady diminishment of Sedol’s win probability as the game goes on and Sedol makes subtly imperfect moves that *humans* think result in even-looking boards.
For all we know from what we’ve seen, AlphaGo could win even if Sedol were allowed a one-stone handicap. But AlphaGo’s strength isn’t visible to us – because human pros don’t understand the meaning of AlphaGo’s moves; and because AlphaGo doesn’t care how many points it wins by, it just wants to be utterly certain of winning by at least 0.5 points.
In the third game, which finished just a few hours ago – by the way, you can watch the remaining two games live at the DeepMind YouTube channel, though make sure to learn the rules beforehand or it will be very boring – Lee Sedol, by then far behind on points, made a desperate ploy to salvage the game (or more likely just use the opportunity to test AlphaGo’s capabilities) by initiating a ko fight. A ko is a special case in go in which a local altercation sharply becomes the fulcrum around which the outcome of the entire game might be decided. Making the winning moves requires perfect, precise play as opposed to AlphaGo’s key method of playing out billions of random games and choosing the one which results in the most captured territory after n moves.
The Korean Lee Sedol is the fourth highest rated go player on the planet. But even as of March 9, were it a person, AlphaGo would have already displaced him. The top player in the world is the Chinese Ke Jie, who is currently 100 Elo points higher than Lee. According to my calculations, this implies that Lee should win slightly more than a third of his matches against Ke Jie. His actual record is 2/8, or 25%. Not only is his current tally against AlphaGo is 0/3, but he was beaten by a considerable number of points by an entity that is perfectly content to minimize its lead in order to to maximize its winning probability.
Finally, a live predictions market on whether Lee Sedol would defeat AlphaGo in any of the three games remaining (that is, before the third match) varied between 20%-25%, implying that the probability of him winning any one game against the the DeepMind monster was less than 10%. (If anything, those probabilities would be even lower now that AlphaGo has demonstrated ko isn’t its Achilles heel, but let us set that aside).
According to my calculations, IF this predictions market is accurate, it would imply that AlphaGo has a ~400-450 Elo point superiority over Lee Sedol based on its performance up to and including the first two games against him.
It would also mean it would be far ahead of Ke Jie, who is the highest ranked human player ever and is currently virtually at his peak. Whereas Lee can only be expected to win 7%-9% of his games against AlphaGo, for Ke Jie this figure would be only modestly higher at 12%-15%. But in principle I see no reason why AlphaGo’s capabilities couldn’t be even higher than that. It’s a long tail – and we can’t see all that far ahead!
But really the most astounding element of this is that what took chess computing a decade to accomplish increasingly appears to have occured in the space of a few days with AlphaGo – despite the slowdown in Moore’s Law in recent years, and the problems of go being far more challenging than those of chess in terms of traditional AI approaches.
For all intents and purposes AI has entered the superhuman realm in a problem space where merely human intelligence had hitherto ruled supreme, and even though we are as far away as ever from discovering the “Hand of God” – the metaphorical perfect game, which will take longer than the lifetime of the universe to compute if all of the universe were to become computronium – we might well be starting the construction of a Sliver of Him.
A win rate of 25% means that AlphaGo’s Elo likely superiority over Lee’s current 3519 points has just plummeted from 400-450 (based on predictions market) to 191, i.e. 3710. Still higher than top player Ke Jie at 3621.
If Lee loses the next game, that Elo difference goes up to 241; if he wins, it gets reduced further to 120. Regardless, we can now say with considerable confidence that AlphaGo is peak human level but decidedly not superhuman level.
Update 2 -
I had a look at go bots historic performance other day. Looks like they move up by 1 S.D. every two years or so. Treating AlphaGo as the new base, humans should be *completely* outclassed by computer in go by around 2020.