Search code examples
cassemblybit-manipulationstrlen

Understanding the magic number 0x07EFEFEFF used for strlen optimization


I stumbled upon this answer regarding the utilization of the magic number 0x07EFEFEFF used for strlen's optimization, and here is what the top answer says:

Look at the magic bits. Bits number 16, 24 and 31 are 1. 8th bit is 0.

  • 8th bit represents the first byte. If the first byte is not zero, 8th bit becomes 1 at this point. Otherwise it's 0.
  • 16th bit represents the second byte. Same logic.
  • 24th bit represents the third byte.
  • 31th bit represents the fourth byte.

However, if I calculate result = ((a + magic) ^ ~a) & ~magic with a = 0x100, I find that result = 0x81010100, meaning that according to the top answerer, the second byte of a equals 0, which is obviously false.

What am I missing?

Thanks!


Solution

  • The bits only tell you if a byte is zero if the lower bytes are non-zero -- so it can only tell you the FIRST 0 byte, but not about bytes after the first 0.

    • bit8=1 means first byte is zero. Other bytes, unknown
    • bit8=0 means first byte is non-zero
    • bit8=0 & bit16=1 means second byte is zero, higher bytes unknown
    • bit8=0 & bit16=0 mans first two bytes are non-zero.

    Also, the last bit (bit31) only tells you about 7 bits of the last byte (and only if the first 3 bytes are non-zero) -- if it is the only bit set then the last byte is 0 or 128 (and the rest are non-zero).