Some concerns on the matching conditions between AlphaZero and Shogi engine

uuunuuun
2017年12月6日
読了時間: 3分

#Note 2018/12/18

After a year since the preprint appears, this paper was published in Science on December 7th, 2018. In the published version, all the concern presented here was solved together with the match records, which will be a treasure in the history of Shogi.

After the publication of the paper (D. Silver et. al. "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm", arXiv:1712.01815), there appeared a few concerns from the community of computer shogi programmers on the matching conditions between AlphaZero and Shogi engine “elmo”. Here I summarize the points with some explanations. (Informations will be updated if error is found).

Resignation point seems too narrow. In the recent software, the evaluation tends to give larger value compared with the chess programs. Many people feel that -900 centipawns is too small for Shogi programs. I guess that the acceptable value would be -3000 to -5000. In the official matches such as World Computer Shogi Competition (http://www2.computer-shogi.org/index_e.html), they do not set the resignation point and wait until the program resigns. After 256 plies, the game is judged as “draw” even if the evaluation is one-sided.
It is strange to set “EnteringKingRule” to “NoEnteringKing”. In the recent matches between shogi software, Entering King frequently occurs and the treatment is critical to the match results. When both kings enter the the other's territory, Yaneuraou counts the number of pieces and declare to win if it has enough pieces. It is not clear if AlphaZero has this functionality. I guess that it will be preferrable to be set to the default “CSARule27”.
Hash size may be too low and tricky. In YaneuraOu 2017 Early, there are two setting on Hash size. One is “Hash” which is set to 16 MB by default and “USI Hash” whose default value 1024MB. In YaneuraOu, the latter value is not used and the former one is important. If “Hash” is kept to the default value, I observe that program becomes very weak. In the matching condition (35MNode per move), even 1GB may be too low. It will be more appropriate if it is set to bigger value.
(December 14, this is my own viewpoint) Difficulty in comparing softwares in different architectures: Alpha Zero uses GPU (4 Google TPUs) while YaneuraOu uses CPU (not specified, 64 threads). While both are impressive hardware, it is difficult to judge if the choice of these hardware is fair for two softwares. Let me scale them down to common-place hardware and see what happens. For GPU, I pick NVidia GTX 1070 which costs ~ $400. Google TPU (x4) has 720T Flop while GTX1070 has 6.3TFlop. They are different by a factor of 100. In the CPU side, YaneuraOu on 64 threads CPU has 35MNPS while on a common-place CPU such as Core i7-6700 (~ $300) with 4 threads has 3MNPS. So the factor between them is 12. In the paper, they show a plot between CPU (GPU) time and Elo rating (Figure 2), If GPU is 100 times slower, the rating decreases by R1600. On the other hand, if CPU is 12 time slower, the rating difference is R600. On the high end platform Deep Mind uses, Alpha zero beat Yaneuraou/elmo by 90-8 (2 draw). The elo-rate difference is 400. On the other hand, on the common-place PC, YaneuraOu/elmo seems to be much stronger than Alpha Zero by margin of R600! (1600-600-400).
Unlike chess case, Deep Mind does not publish any match records (kifu) between Alpha zero and elmo. Many people are curious and eager to see them!

Finally I would like to mention that 2017 is a dog year for shogi engines and we have plenty of programs which are much stronger than elmo. For instance, the winner program “Heisei shogi gassen ponpoko” (“ponpoko” in short) in Shogi Denno Tounament (http://denou.jp/tournament2017/), overrates elmo by R150. This program is available at https://github.com/nodchip/hakubishin-/releases as “tanuki-sdt5-2017-11-16”. It is also known that Apery_sdt5 has even stronger evaluation file (available at https://t.co/S7q7XlW4dG), (R200 stronger than elmo). Currently the strongest evaluation file is “aperypaq” which is an improvement of Apery_sdt5 (available at http://qhapaq.hatenablog.com/entry/2017/11/28/195426). (R250 stronger than elmo). These should be combined with YanuraOu. I hope that the authors may test these programs before declaring AlphaZero beats currently available shogi programs.

Here is a virtual rating tables including AlphaZero and recent programs.