Search code examples
c++calgorithmblasbitset

efficient bitwise sum calculation


Is there an efficient way to calculate a bitwise sum of uint8_t buffers (assume number of buffers are <= 255, so that we can make the sum uint8)? Basically I want to know how many bits are set at the i'th position of each buffer.

Ex: For 2 buffers

uint8 buf1[k] -> 0011 0001 ...
uint8 buf2[k] -> 0101 1000 ...
uint8 sum[k*8]-> 0 1 1 2 1 0 0 1... 

are there any BLAS or boost routines for such a requirement?

This is a highly vectorizable operation IMO.

UPDATE: Following is a naive impl of the requirement

for (auto&& buf: buffers){
  for (int i = 0; i < buf_len; i++){
    for (int b = 0; b < 8; ++b) {
      sum[i*8 + b] += (buf[i] >> b) & 1;
    }
  }
}

Solution

  • An alternative to OP's naive code:

    Perform 8 additions at once. Use a lookup table to expand the 8 bits to 8 bytes with each bit to a corresponding byte - see ones[].

    void sumit(uint8_t number_of_buf, uint8_t k, const uint8_t buf[number_of_buf][k]) {
      static const uint64_t ones[256] = { 0, 0x1, 0x100, 0x101, 0x10000, 0x10001, 
          /* 249 more pre-computed constants */ 0x0101010101010101};
    
      uint64_t sum[k];
      memset(sum, 0, sizeof sum):
    
      for (size_t buf_index = 0; buf_index < number_of_buf;  buf_index++) {
        for (size_t int i = 0; i < k; i++) {
          sum[i] += ones(buf[buf_index][i]);
        }
      }
    
      for (size_t int i = 0; i < k; i++) {
        for (size_t bit = 0; bit < 8;  bit++) {
          printf("%llu ", 0xFF & (sum[i] >> (8*bit)));
        }
      }
    }
    

    See also @Eric Postpischil.