Consider the following centered hexagonal bitboard representation (padding is in boldface):
56
55 49
54 48 42
53 47 41 35
52 46 40 34 28
45 39 33 27
44 38 32 26 20
37 31 25 19
36 30 24 18 12
29 23 17 11
28 22 16 10 04
21 15 09 03
20 14 08 02 60
13 07 01 59
06 00 58
63 57
56
This representation fits in a 64-bit integer and allows for easy movement in the 6 hexagonal directions by rotating bits 1, 7 or 8 spaces to the right or to the left respectively. If it helps with visualization, you can deform this hexagon into a square:
42 43 44 45 46 47 48
35 36 37 38 39 40 41
28 29 30 31 32 33 34
21 22 23 24 25 26 27
14 15 16 17 18 19 20
07 08 09 10 11 12 13
00 01 02 03 04 05 06
Now, what I want to do is rotate this bitboard 60° clockwise, such that the [45,46,47,38,39,31] triangle becomes the [48,41,34,40,33,32] triangle, etc. How do I do this?
This permutation is kind of a mess, with every relevant bit having a distinct move-distance. The permutation diagram looks like this (top row is output):
That does suggest some approaches though. If we look near the top, every "group" is formed by gathering some bits from the input in ascending order, so it can be done with 7 compress_right operations aka PEXT
which is efficient on Intel (not so efficient on AMD so far). What that really comes down to is sampling the vertical columns, so extracting bits with a stride of 8.
So if PEXT
is acceptable, it could be done like this (not tested):
uint64_t g0 = _pext_u64(in, 0x8080808);
uint64_t g1 = _pext_u64(in, 0x404040404);
uint64_t g2 = _pext_u64(in, 0x20202020202);
uint64_t g3 = _pext_u64(in, 0x1010101010101);
uint64_t g4 = _pext_u64(in, 0x808080808080);
uint64_t g5 = _pext_u64(in, 0x404040404000);
uint64_t g6 = _pext_u64(in, 0x202020200000);
uint64_t out = g0 | (g1 << 7) | (g2 << 14) | (g3 << 21) |
(g4 << 28) | (g5 << 35) | (g6 << 42);
This permutation is not routable by a butterfly network, but Beneš networks are universal so that will work.
So it can be done with 11 of these permute steps, also known as delta swaps:
word bit_permute_step(word source, word mask, int shift) {
word t;
t = ((source >> shift) ^ source) & mask;
return (source ^ t) ^ (t << shift);
}
There is some choice in how to create the exact masks, but this works:
x = bit_permute_step(x, 0x1001400550054005, 1);
x = bit_permute_step(x, 0x2213223111023221, 2);
x = bit_permute_step(x, 0x01010B020104090E, 4);
x = bit_permute_step(x, 0x002900C400A7007B, 8);
x = bit_permute_step(x, 0x00000A0400002691, 16);
x = bit_permute_step(x, 0x0000000040203CAD, 32);
x = bit_permute_step(x, 0x0000530800001CE0, 16);
x = bit_permute_step(x, 0x000C001400250009, 8);
x = bit_permute_step(x, 0x0C00010403080104, 4);
x = bit_permute_step(x, 0x2012000011100100, 2);
x = bit_permute_step(x, 0x0141040000000010, 1);