Search code examples
c++ctype-conversionstrlenstrict-aliasing

Fast strlen function vs aliasing rules


I found this "fast strlen function" implementation:

// for x86 only
size_t my_strlen(const char *s) {
    size_t len = 0;
    for(;;) {
        unsigned x = *(unsigned*)s;
        if((x & 0xFF) == 0) return len;
        if((x & 0xFF00) == 0) return len + 1;
        if((x & 0xFF0000) == 0) return len + 2;
        if((x & 0xFF000000) == 0) return len + 3;
        s += 4, len += 4;
    }
}

An optimization techique used here is obviously simple: read memory by natural CPU words (the code is old and assumes x32 CPU), rather then by simple bytes.

But this code is violating aliasing rules, and so cause undefined behaviour, which can be freely optimized out by the compiler (those making code even faster, but inccorect).

I also see now that it is not portable, as it tied to the little-endian endianness.

Or may be i am completly wrong here and the code above is correct? Is it correct for C? For C++?


Solution

  • This is just very bad code. Even the code's author warns:

    • This function will crash if an non-readable memory page is located right after the end of the string. The simplest way to prevent this is to allocate 3 additional bytes at the end of string.

    • The dwords may be unaligned, but x86 architecture allows access to unaligned data. For small strings, the alignment will take more time than the penalty of unaligned reads.

    • The code is not portable: you will have to add another 4 conditions if you use a 64-bit processor. For big-endian architectures, the order of conditions should be reversed.

    Even if this did not break the aliasing rule the burden placed on the coder to make my_strlen work is completely unjustifiable. It has been stated several times that strlen will already be tuned beyond anything the average coder could accomplish.

    But an additional statement should be made for C++: Bjarne Stroustrup, the creator of C++, in the last page of chapter 4 in his book: "The C++ Programming Language" says:

    Prefer strings over C-style strings

    You'll find taking the size of a string far more performant than finding the size of a C-String.

    EDIT:

    In your comment you say you are working with StaticallyBufferedString which purports to solve string's "pooling memory model" which causes:

    • Unnecessary heap locks in a multithreaded context
    • Fragmentation from real-time size control

    I'd like to suggest C++17's string_view which like all of C++17 was constructed with multi-threading in mind. It provides the functionality of a string backed by heap and constexpr friendly C-Strings. You can even get a jump start on learning about it with namespace experimental: http://en.cppreference.com/w/cpp/experimental/basic_string_view Unlike your time put in on StaticallyBufferedStrings the knowledge you gain will be perfectly portable and applicable to any future C++ work you do!