Search code examples
cperformancebit-manipulationbitwise-operatorsbitvector

Multiple bitvectors; how to find bits that are set exactly n times?


I have a collection of four bitvectors, for example:

b1 = 00001010
b2 = 10100111
b3 = 10010010
b4 = 10111110

I would like to get the masks of those bits that are set in exactly 0, 1, 2, 3 or 4 of the given bitvectors. So m0 would be the mask of bits that are not set in any of the four bitvectors, m3 is the mask of those bits that are set in exactly three of the bitvectors, etc.:

m0 = 01000000
m1 = 00000001
m2 = 00111100
m3 = 10000000
m4 = 00000010

What is the fastest way to find these masks using bitwise operators?

I assume these have the least operations for 0 and 4 bits:

m0 = ~(b1 | b2 | b3 | b4)  // 4 ops
m4 = b1 & b2 & b3 & b4     // 3 ops

For the other options I'm not so sure my methods have the least operations:

m1 = ((b1 ^ b2) & ~(b3 | b4)) | (~(b1 | b2) & (b3 ^ b4)) // 9 operations
m2 = ((b1 ^ b2) & (b3 ^ b4)) | ((b1 ^ b3) & (b2 ^ b4)) | ((b1 ^ b4) & (b2 ^ b3)) // 11 operations
m3 = ((b1 ^ b2) & (b3 & b4)) | ((b1 & b2) & (b3 ^ b4)) // 7 operations

Is this the fastest way to calculate these masks or can I do it faster (in fewer operations)?

For most of the cases I need one or a few of these masks, not all of them at the same time.

(Note, in reality I will be doing this for 64 or 128 bit vectors. It is probably irrelevant, but I do it in C on a 32-bit x86 platform.)


Solution

  • 14 operations for all masks.

    The idea is to first sort the bits, using min = x & y and max = x | y as conditional swap. This costs 10 operations. Then simply extract the masks which costs 4 operations.

    // Split in lower and upper half
    var c1 = b1 & b2;
    var c3 = b1 | b2;
    var c2 = b3 & b4;
    var c4 = b3 | b4;
    
    // Sort within each half
    var d1 = c1 & c2; // (b1 & b2) & (b3 & b4)
    var d2 = c1 | c2; // (b1 & b2) | (b3 & b4)
    var d3 = c3 & c4; // (b1 | b2) & (b3 | b4)
    var d4 = c3 | c4; // (b1 | b2) | (b3 | b4)
    
    // Sort middle
    var e1 = d1;      // (b1 & b2) & (b3 & b4)
    var e2 = d2 & d3; // ((b1 & b2) | (b3 & b4)) & ((b1 | b2) & (b3 | b4))
    var e3 = d2 | d3; // ((b1 & b2) | (b3 & b4)) | ((b1 | b2) & (b3 | b4))
    var e4 = d4;      // (b1 | b2) | (b3 | b4)
    
    // Extract masks
    var m4 = e1;      // (b1 & b2) & (b3 & b4)
    var m3 = e2 ^ e1; // (((b1 & b2) | (b3 & b4)) & ((b1 | b2) & (b3 | b4))) ^ ((b1 & b2) & (b3 & b4))
    var m2 = d3 ^ d2; // The same as e3 ^ e2, saves two operations if only m2 is required
    var m1 = e4 ^ e3; // ((b1 | b2) | (b3 | b4)) ^ (((b1 & b2) | (b3 & b4)) | ((b1 | b2) & (b3 | b4)))
    var m0 = ~e4;     // ~((b1 | b2) | (b3 | b4))
    

    (The code is in C#, but it's trivial to convert this to C.)


    If you use this code calculate only some masks and simply remove the lines that don't affect the result (a decent compiler should do this automatically). The performance isn't bad either:

    m4: 3 operations
    m3: 9 operations
    m2: 7 operations
    m1: 9 operations
    m0: 4 operations