Search code examples
c++performance64-bitbit-manipulationbitboard

"Isolate" specific Row/Column/Diagonal from a 64-bit number


OK, let's consider a 64-bit number, with its bits forming a 8x8 table.

E.g.

0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 1 0 1 1 1 1 0 1 0 0 1 1 0 1 0 1 0 1 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0

written as

a b c d e f g h
----------------
0 1 1 0 1 0 1 0
0 1 1 0 1 0 1 1 
0 1 1 1 1 0 1 0 
0 1 1 0 1 0 1 0 
1 1 1 0 1 0 1 0 
0 1 1 0 1 0 1 0 
0 1 1 0 1 1 1 0 
0 1 1 0 1 0 1 0

Now, what if we want to isolate JUST e.g. column d (00100000) (or any row/diagonal for that matter) ?

Can this be done? And if so, how?


HINTS :

  • (a) My main objective here - though not initially mentioned - is raw speed. I'm searching for the fastest algorithm around, since the "retrieval" function is being performed some millions of times per second.

  • (b) This is what comes closer to what I mean : https://www.chessprogramming.org/Kindergarten_Bitboards


Solution

  • Here's a solution with only 4 main steps:

    const uint64_t column_mask = 0x8080808080808080ull;
    const uint64_t magic = 0x2040810204081ull;
    
    int get_col(uint64_t board, int col) {
        uint64_t column = (board << col) & column_mask;
        column *= magic;
        return (column >> 56) & 0xff;
    }
    

    It works like this:

    • the board is shifted to align the column with the left side
    • it's masked to only contain the required column (0..8)
    • it's multiplied by a magic number which results in all the original bits pushed to the left side
    • the left-most byte is shifted to the right

    The magic number is chosen to copy only the needed bits and let the rest fall into unused places / overflow over the number. The process looks like this (digits are bit "IDs", rather than the number itself):

    original column: ...1.......2.......3.......4.......5.......6.......7.......8....
    aligned column:  1.......2.......3.......4.......5.......6.......7.......8.......
    multiplied:      123456782345678.345678..45678...5678....678.....78......8.......
    shifted to right:........................................................12345678
    

    If you add the const keywords, assembly becomes quite nice actually:

    get_col:
    .LFB7:
            .cfi_startproc
            movl    %esi, %ecx
            movabsq $-9187201950435737472, %rax
            salq    %cl, %rdi
            andq    %rax, %rdi
            movabsq $567382630219905, %rax
            imulq   %rax, %rdi
            shrq    $56, %rdi
            movl    %edi, %eax
            ret
    

    No branching, no external data, around 0.4ns per calculation.

    Edit: takes around 6th of the time using NPE's solution as baseline (next fastest one)