Recognising a chess piece with bitboards

When the chessboard is stored in a variety of bitboards, how do modern chess engines recognise what type/side piece is situated on a particular cell? I'm having problems with this, since to find out what type/side piece a particular bit is, I have to always do:

if((bit_check & occupied) == 0ULL) ... // empty
else if((bit_check & white) != 0ULL) // white
    if((bit_check & white_pawns) != 0ULL) ... // white pawns
    else if((bit_check & white_rooks) != 0ULL) ... // white rooks
    ....
    else if((bit_check & white_kings) != 0ULL) ... // white kings
else if((bit_check & black) != 0ULL) // black
    if((bit_check & black_pawns) != 0ULL) ... // black pawns
    ....
    else if((bit_check) & black_kings) != 0ULL) ... // black kings

This is quite a tedious process and it has to be done quite a few times (for example, during move generation to see what is being captured). I'm not sure if I should just go with this or whether it would be faster to simply create a 64 array of type Piece[64], which will inherently store the piece type.

Which would be better, considering it will have to be millions of times, for capture analysis in the search functions. Am I doing this wrong?

Solution

The bit check itself is fast; I'd be worried mostly about the branching.

Instead, consider uint64_t bitboards[12] as an array of the 12 bitboards for all pieces. This is now contiguous in memory and can be scanned in a loop:

for (int i = 0; i != 12; ++i)
{
  if (bitboards[i] && bit_check) return i;
}
return -1; // empty.

Only two branches (loop and check) is easier for the branch predictor and the contiguous memory optimizes the prefetcher.

Obvious variations are checking bitboards[0] to [5] for white pieces only and [6] to [11] for black pieces only.

A more subtle variant:

uint64_t bitboards[13];
bitboards[12] = ~uint64_t(0);
for (int i = 0; /* NO TEST*/ ; ++i)
{
     if (bitboards[i] && bit_check) return i;
}

Instead of returning -1 for empty, this will return 12 (the sentinel value). However, this replaces the conditional loop branch with a faster unconditional branch. It also means that the return value is always int i.

Another unrelated optimization is to recognize that pawns are the most common pieces, so it's more efficient to use bitboards[0] for white pawns and either bitboards[1] or bitboards[6] for black pawns, depending on whether you interleave black or white pieces.

[edit] If you have a separate bitboard for color, you then do not need two bitboards for white pawns and black pawns. Instead, have a single bitboard for pawns. To check for a black pawn, AND the two values. (bit_check & color & bitboard[0]). To check for a white pawn, invert the color (bit_check & ~color & bitboard[0])