Search code examples
cbinarybit-manipulation

Bit manipulations


I am learning bit manipulation and have come across the following code. I'm trying to work out what it does:

long func_c(unsigned long x) {
  long val = 0;


  // sums of bits in x (done in parallel for each byte segment)
  for (int i = 0; i < 8; i++) {
    val += (x & 0x0101010101010101L);
    x >>= 1;
  }

  // adds the top 32 bits to the lowest 32 but why?
  val += (val >> 32);
// adds the top 16 bits to the lowest 16 but why?
  val += (val >> 16);
// adds the top 8 bits to the lowest 8 but why?
  val += (val >> 8);

  // gets the lowest byte
  return val & 0xff;
}

What are the series of right shifts trying to achieve?


Solution

  • First and most important, the code is incorrect. The three lines at the end of the function should be:

      val += (val >> 32);
      val += (val >> 16);
      val += (val >> 8);
    

    Your comments were correct, but the code was not. With the above change, the function works correctly.

    The for loop creates in val a set of 8 bytes, each one of which contains the number of '1' bits in the corresponding byte of x.

    So remember, we are trying to get the number of '1' bits in the original x, and so far all we have is byte 0 of val containing the number of '1' bits in byte 0 of x, byte 1 of val is the number of '1' bits in byte 1 of x, etc. So we want to add together the 8 bytes of val.

    The code gets some additional parallelism by adding the upper 4 bytes of val to the lower 4 bytes of val. Remember that each byte has a maximum value of 8, so when you add these two 32-bit values together, each byte adds to the corresponding other byte without overflow. So the 32-bit result now has partial sums of the bytes of the original val.

    Then, get some more parallelism by splitting the 32-bit result into two 16-bit values and adding them together. Once again, the corresponding bytes get added together without overflow.

    The final add of the two halves of the 16-bit number gives you the total number of '1' bits in the original x.