Search code examples
algorithmbit-manipulationnibble

Fastest way to count bytes that contain a 4-bit value at any bit offset?


I have an algorithm to search an array that contains 64 bytes of values and I want to find a 4 bit value and see how many times it occurs in the array.

For example, the first element in the array might be 10101010 and the 4 bit value is 1010 in which this only counts once. If the next element is 10000000, this would count as 0.

It's easy to go through all elements in the array and check each element if our 4 bit result is in it but is there a faster way to do this?


Solution

  • precompute:
    - 4bit key offset by 0bit
    - 4bit key offset by 1bit
    - 4bit key offset by 2bit
    - 4bit key offset by 3bit
    
    - 4 bit mask offset by 0bit
    - 4 bit mask offset by 1bit
    - 4 bit mask offset by 2bit
    - 4 bit mask offset by 3bit
    
    for each byte in bytes:
      for each offset:
        mask out bytes we care about
        check if it matches our precomputed key
    
    • Since any item we don't look at could be a match or not, we have to look at all items, so O(n) at least.
    • Since it's sufficient to look at all items once, there's no reason to use a non-linear algorithm.
    • Since it's only 64 values total, highly optimized SIMD instructions are probably not worth the hassle. It's unlikely to improve performance.

    The precomputing of the keys and the masks are done by shifting the key left one bit at a time, and by setting the 4 relevant bits to 1 in the masks.

    match = (bytes[i] & (0x0F << offset)) == (key << offset)
    

    for some key and each of the offsets 0-3. Since (key << offset) and (0x0F << offset) will be reused every four loops, precomputing the four values and unrolling the loop will be faster. Your compiler might do this for you, but if not, here's how to do it manually:

    matches = 0
    for (int i = 0; i < 64; i += 4) {
        const mask0 = 0x0F << 0
        const key0 = key << 0
        match0 = (bytes[i+0] & mask0) == key0
    
        const mask1 = 0x0F << 1
        const key1 = key << 1
        match1 = (bytes[i+1] & mask1) == key1
    
        ...
    
        matches += match0 + match1 + match2 + match3
    }