optimizing string search for a single byte

In general, string search algorithms (like Boyer-Moore) are optimized for cases where the search string is long-ish. That is, Boyer-Moore is great because by lining up the search string with our text, we can skip N = len(search string) characters if the end of the search string doesn't match the text.

But what if our search string is really short? Like a single byte or character? In this case, Boyer-Moore doesn't help much.

So, what are some alternative algorithms for speeding up searching?

I know that many optimized library search routines (like memchr in C), take the strategy of reading the input string word by word, rather than char by char. So on a 64-bit machine, 8 bytes can be examined at once, rather than a single byte.

I'd like to know how these optimized string/byte searches actually work. How does the actual comparison work then? I know it obviously must involve bit masking - but I don't see how performing all the bit masking is any better than simply search character by character.

So, suppose our search char is 0xFF. Ignoring alignment issues, let's say we have some input buffer: void* buf. We can read it word by word by saying:

const unsigned char search_char = 0xFF;
unsigned char* bufptr = static_cast<unsigned char*>(buf);
unsigned char* bufend = bufptr + BUF_SIZE;

while (bufptr != bufend)
{
   // Ignore alignment concerns for now, assume BUF_SIZE % sizeof(uintptr_t) == 0
   //
   std::uinptr_t next_word = *reinterpret_cast<std::uintptr_t*>(bufptr);

   // ... but how do we compare next_word with our search char?

   bufptr += sizeof(std::uintptr_t);
}

I also realize that the above code is not strictly portable, because std::uintptr_t isn't guaranteed to actually be word size. But let's assume for the sake of this question that std::uinptr_t is equal to the processor word size. (An actual implementation would likely need platform-specific macros to get the actual word-size)

So, how do we actually check if the byte 0xFF occurs anywhere in the value of next_word?

We can use OR operations of course, but it seems we'd still need to perform a lot of OR'ing and bit shifting to check each byte of next_word, at which point it becomes questionable whether this optimization is actually any better than simply scanning character by character.

Solution

You can use this snippet from Bit Twiddling Hacks:

#define haszero(v) (((v) - 0x01010101UL) & ~(v) & 0x80808080UL)
#define hasvalue(x,n) \
(haszero((x) ^ (~0UL/255 * (n))))

It effectively XORs each byte with the character to be tested, then determines if any byte is now zero.

At this point you can get the location of the matching byte (or bytes) from the return value of the expression, e.g. the value will be 0x00000080 if the least significant byte matches the value.