Search code examples
cglibcstrlen

How the magic bits are improving the strlen function in glibc


I was going through the source of strlen for glibc. They have used magic bits to find the length of string. Can someone please explain how it is working. Thank you


Solution

  • Let's say this function is looking through a string -- 4 bytes at a time, as explained by the comments (we assume long ints are 4 bytes) -- and the current "chunk" look like this:

    '\3'     '\3'     '\0'     '\3'
    00000011 00000011 00000000 00000011  (as a string: "\x03\x03\x00\x03")
    

    The strlen function is just looking for the first zero byte in this string. It first determines, for each 4-byte chunk, whether there's any zero byte in there, by checking this magic_bits shortcut first: it adds the 4 bytes to this value:

    01111110 11111110 11111110 11111111
    

    Adding any non-zero bytes to this value will cause the 1's to overflow into the holes marked by zeroes, by propagating the carries. For our chunk, it'd look like this:

      11111111 111111          1 1111111   Carries
      00000011 00000011 00000000 00000011  Chunk
      01111110 11111110 11111110 11111111  Magic bits
    + -----------------------------------
      10000010 00000001 11111111 00000010
      ^      ^        ^        ^         
    

    (The hole bits are marked by ^'s.)

    And, from the comments:

           /* Look at only the hole bits.  If any of the hole bits
              are unchanged, most likely one of the bytes was a
              zero.  */
    

    If there's no zeroes in the chunk, all of the hole bits will get set to 1's. However, because of the zero byte, one hole bit didn't get filled by a propagating carry, and we can then go check which byte it was.

    Essentially, it speeds up the strlen calculation by applying some bit addition magic to 4-byte chunks to scan for zeroes, before narrowing down the search to single byte comparisons.