I was going through the source of strlen for glibc. They have used magic bits
to find the length of string. Can someone please explain how it is working.
Thank you
Let's say this function is looking through a string -- 4 bytes at a time, as explained by the comments (we assume long ints
are 4 bytes) -- and the current "chunk" look like this:
'\3' '\3' '\0' '\3'
00000011 00000011 00000000 00000011 (as a string: "\x03\x03\x00\x03")
The strlen function is just looking for the first zero byte in this string. It first determines, for each 4-byte chunk, whether there's any zero byte in there, by checking this magic_bits
shortcut first: it adds the 4 bytes to this value:
01111110 11111110 11111110 11111111
Adding any non-zero bytes to this value will cause the 1's to overflow into the holes marked by zeroes, by propagating the carries. For our chunk, it'd look like this:
11111111 111111 1 1111111 Carries
00000011 00000011 00000000 00000011 Chunk
01111110 11111110 11111110 11111111 Magic bits
+ -----------------------------------
10000010 00000001 11111111 00000010
^ ^ ^ ^
(The hole bits are marked by ^
's.)
And, from the comments:
/* Look at only the hole bits. If any of the hole bits
are unchanged, most likely one of the bytes was a
zero. */
If there's no zeroes in the chunk, all of the hole bits will get set to 1
's. However, because of the zero byte, one hole bit didn't get filled by a propagating carry, and we can then go check which byte it was.
Essentially, it speeds up the strlen calculation by applying some bit addition magic to 4-byte chunks to scan for zeroes, before narrowing down the search to single byte comparisons.