I am reading the "strlen" source code from the glibc, and the trick developers found to speed it up is to read n bytes where n is the size of a long word, instead of reading 1 byte at each iteration.
I will assume that a long word has 4 bytes.
The tricky part is that every "chunk" of 4 bytes the function reads can contain a null byte, so at each iteration, the function has to check if there was a null byte in the chunk. They do it like
if (((longword - lomagic) & ~longword & himagic) != 0) { /* null byte found */ }
where longword
is the chunk of data and himagic
and lowmagic
are magical values defined as:
himagic = 0x80808080L;
lomagic = 0x01010101L;
Here is the comment for thoses values
/* Bits 31, 24, 16, and 8 of this number are zero. Call these bits
the "holes." Note that there is a hole just to the left of
each byte, with an extra at the end:
bits: 01111110 11111110 11111110 11111111
bytes: AAAAAAAA BBBBBBBB CCCCCCCC DDDDDDDD
The 1-bits make sure that carries propagate to the next 0-bit.
The 0-bits provide holes for carries to fall into. */
How does this trick of finding the null byte work?
From the famous "Bit Twiddling Hacks" page By Sean Eron Anderson, a description of what is currently used in the glibc
implementation you're referring to (Anderson calls the algorithm hasless(v, 1)
):
The subexpression
(v - 0x01010101UL)
, evaluates to a high bit set in any byte whenever the corresponding byte in v is zero or greater than0x80
. The sub-expression~v & 0x80808080UL
evaluates to high bits set in bytes where the byte of v doesn't have its high bit set (so the byte was less than0x80
). Finally, by ANDing these two sub-expressions the result is the high bits set where the bytes in v were zero, since the high bits set due to a value greater than0x80
in the first sub-expression are masked off by the second.
It appears that the comment(s) in the glibc
source is confusing because it doesn't apply to what the code is actually doing anymore - it's describing what would have been an implementation of the algorithm that Anderson describes just before the hasless(v, 1)
algorithm is described.