I do a lot of bit vector operations in my software. For example: suppose I need to store boolean information about a candidate 'n', I do the following:
uint64_t *information_vector;
uint32_t pos = n / 64;
uint32_t bit_pos = n % 64;
information_vector[pos] |= (1 << bit_pos);
and I follow similar procedure while reading that information:
uint32_t pos = n / 64;
uint32_t bit_pos = n % 64;
if (information_vector[pos] & (1 << bit_pos)) {
// do something
}
In the meantime, I also write the information_vector to the disk and read it back again. Now, I am trying to solve a bug which is giving me nightmares and it struck me that Endianess might be a culprit here but I can not explain. Is there any way I can check? Is this bit vector manipulation generally endian safe and across architectures?
I also see that somewhere in the code I set some other information in another bit vector for the same candidate as:
uint8_t byte_position = n / 8;
uint8_t bit_position = n % 8;
another_information_vector[byte_position] |= (1 << bit_position);
I usually find common set of attributes by and-ing these bit vectors.
For most cases, the safest variant is to operate on byte level, so, divisor is 8. OTOH it can be suboptimal in some cases. There are architectures without direct access to a byte, or with expensive access, compared with a word access.
On a little-endian machine, the same approach works unchanged when selecting any reasonable divisor (8, 16, 32, 64). For example, for bit index 22, byte-level access deals with bit numbered 6 of the byte with index 2; short-word access deals with bit 6 of short-word with 1; and so forth.
On a big-endian machine, this needs replacing of 1 << bit_position
with 1 << (BITS_PER_CELL-1-bit_position)
, or (the same) HIGHEST_BIT >> bit_position
, where HIGHEST_BIT is 0x80 for uint8_t, 0x80000000 for uin32_t, etc. And, bit index 0 will mean MSB of byte 0, as opposed to little-endian case where it means LSB of byte 0.
(A similar effect can be seen on serial wires. In RS232 or Ethernet, bytes are transmitted from LSB to MSB. The individual/group bit in MAC address is the very first one on the wire but it's LSB of the first octet.)