Search code examples
algorithmfunctionbinary

Find nth SET bit in an int


Instead of just the lowest set bit, I want to find the position of the nth lowest set bit. (I'm NOT talking about value on the nth bit position)

For example, say I have:
0000 1101 1000 0100 1100 1000 1010 0000

And I want to find the 4th bit that is set. Then I want it to return:
0000 0000 0000 0000 0100 0000 0000 0000

If popcnt(v) < n, it would make sense if this function returned 0, but any behavior for this case is acceptable for me.

I'm looking for something faster than a loop if possible.


Solution

  • It turns out that it is indeed possible to do this with no loops. It is fastest to precompute the (at least) 8 bit version of this problem. Of course, these tables use up cache space, but there should still be a net speedup in virtually all modern pc scenarios. In this code, n=0 returns the least set bit, n=1 is second-to-least, etc.

    Solution with __popcnt

    There is a solution using the __popcnt intrinsic (you need __popcnt to be extremely fast or any perf gains over a simple loop solution will be moot. Fortunately most SSE4+ era processors support it).

    // lookup table for sub-problem: 8-bit v
    byte PRECOMP[256][8] = { .... } // PRECOMP[v][n] for v < 256 and n < 8
    
    ulong nthSetBit(ulong v, ulong n) {
        ulong p = __popcnt(v & 0xFFFF);
        ulong shift = 0;
        if (p <= n) {
            v >>= 16;
            shift += 16;
            n -= p;
        }
        p = __popcnt(v & 0xFF);
        if (p <= n) {
            shift += 8;
            v >>= 8;
            n -= p;
        }
    
        if (n >= 8) return 0; // optional safety, in case n > # of set bits
        return PRECOMP[v & 0xFF][n] << shift;
    }
    

    This illustrates how the divide and conquer approach works.

    General Solution

    There is also a solution for "general" architectures- without __popcnt. It can be done by processing in 8-bit chunks. You need one more lookup table that tells you the popcnt of a byte:

    byte PRECOMP[256][8] = { .... } // PRECOMP[v][n] for v<256 and n < 8
    byte POPCNT[256] = { ... } // POPCNT[v] is the number of set bits in v. (v < 256)
    
    ulong nthSetBit(ulong v, ulong n) {
        ulong p = POPCNT[v & 0xFF];
        ulong shift = 0;
        if (p <= n) {
            n -= p;
            v >>= 8;
            shift += 8;
            p = POPCNT[v & 0xFF];
            if (p <= n) {
                n -= p;
                shift += 8;
                v >>= 8;
                p = POPCNT[v & 0xFF];
                if (p <= n) {
                    n -= p;
                    shift += 8;
                    v >>= 8;
                }
            }
        }
    
        if (n >= 8) return 0; // optional safety, in case n > # of set bits
        return PRECOMP[v & 0xFF][n] << shift;
    }
    

    This could, of course, be done with a loop, but the unrolled form is faster and the unusual form of the loop would make it unlikely that the compiler could automatically unroll it for you.