deep-learning reinforcement-learning hexagonal-tiles

Apply alpha-zero-general to Abalone (a hexagonal board game)

I’m trying to use alpha zero general to apply on Abalone. And here is the original code of alpha zero general: https://github.com/suragnair/alpha-zero-general

There are implements of some rectangle board games such as connect4, gobang, othello, tictactoe, and tictactoe_3d. However, Abalone, which has a hexagon board, makes me confused about how to describe it board in a 2d array which should be the input of the network.

I've implemented another code before I found alpha zero general. This is the original board list I used:

       [  2, 2, 0, 1, 1  ], 
      [  2, 2, 2, 1, 1, 1  ],
     [  0, 2, 2, 0, 1, 1, 0  ],
   [  0, 0, 0, 0, 0, 0, 0, 0  ],
  [  0, 0, 0, 0, 0, 0, 0, 0, 0 ],
   [  0, 0, 0, 0, 0, 0, 0, 0  ],
    [  0, 1, 1, 0, 2, 2, 0  ],
      [  1, 1, 1, 2, 2, 2  ],
       [  1, 1, 0, 2, 2  ]

Where 1 stands for blacks, 2 stands for white. However, I found it cannot be inputted to CNN.

I have an idea to map it on an 2d numpy array but don't know whether it's ok to be used.

[  2, 2, -1, -1, 0, 1, 1, 2, 2  ], 
[  2, 2, -1, -1, -1, 1, 1, 1, 2  ],
[  2, 0, -1, -1, 0, 1, 1, 0, 2  ],
[  2, 0, 0, 0, 0, 0, 0, 0, 0  ],
[  0, 0, 0, 0, 0, 0, 0, 0, 0  ],
[  0, 0, 0, 0, 0, 0, 0, 0, 2  ],
[  2, 0, 1, 1, 0, -1, -1, 0, 2  ],
[  2, 1, 1, 1, -1, -1, -1, 2, 2  ],
[  2, 2, 1, 1, 0, -1, -1, 2, 2  ]

where 1 stands for black, -1 stands for white, and 2 stands for invalid squares.

Is this idea ok?

Is there a recommended way to apply it to a hexagon board game or is there any exists example that I can refer to?

Solution

If you are still interested in this topic: I've been using this 9x9 char array which is similar to the official abalone board game notation:

https://en.wikipedia.org/wiki/Abalone_(board_game) But the figure on Wikipedia is confusing if you arrange the array like this it becomes clearer

'a' and 'b' are black and white marbles in the classic board game setup the dash fields '-' are not part of the hexagon grid and ' ' the white spaces are on the grid. You can write down the 3 direction axis by only using row and column direction.

if you move a marble up or down a row the movement happens diagonal (top left to bottom right)

axis 1 -> position[row+1][column] or position[row-1][column]

axis 2 -> position[row][column+1] or position[row][column-1]

axis 3 -> position[row+1][column-1] or position[row-1][column+1]

axis 3 is the other diagonal direction with the following shifted array table it should be better to recognize

char globalBoard[9][9]{ // ( classic setup)

-4, -3, -2, -1,  0,  1,  2,  3,  4  centered notation
//0   1   2   3   4   5   6   7   8  indexed notation
//-----------------------------------------------
 {'-','-','-','-','a','a','a','a','a'},      //-4       0
   {'-','-','-','a','a','a','a','a','a'},      //-3       1
     {'-','-',' ',' ','a','a','a',' ',' '},      //-2       2
       {'-',' ',' ',' ',' ',' ',' ',' ',' '},      //-1       3
         {' ',' ',' ',' ',' ',' ',' ',' ',' '},      // 0       4
           {' ',' ',' ',' ',' ',' ',' ',' ','-'},      // 1       5
             {' ',' ','b','b','b',' ',' ','-','-'},      // 2       6
               {'b','b','b','b','b','b','-','-','-'},      // 3       7
                 {'b','b','b','b','b','-','-','-','-'}       // 4       8
};//                                                         centered  indexed
                                                            notation  notation

I'm currently working on something similar like you. I'm trying to get my desktop robotic arm to play the board game against me following some game algorithms like min/max, Alpha–beta pruning and Monte Carlo tree search.

I came across this post because I've been searching for someone who used machine learning to play the game especially the alpahZero network.

I'd love to hear how your project developed so far :)