I found this "fast strlen function" implementation:
// for x86 only
size_t my_strlen(const char *s) {
size_t len = 0;
for(;;) {
unsigned x = *(unsigned*)s;
if((x & 0xFF) == 0) return len;
if((x & 0xFF00) == 0) return len + 1;
if((x & 0xFF0000) == 0) return len + 2;
if((x & 0xFF000000) == 0) return len + 3;
s += 4, len += 4;
}
}
An optimization techique used here is obviously simple: read memory by natural CPU words (the code is old and assumes x32 CPU), rather then by simple bytes.
But this code is violating aliasing rules, and so cause undefined behaviour, which can be freely optimized out by the compiler (those making code even faster, but inccorect).
I also see now that it is not portable, as it tied to the little-endian endianness.
Or may be i am completly wrong here and the code above is correct? Is it correct for C? For C++?
This is just very bad code. Even the code's author warns:
This function will crash if an non-readable memory page is located right after the end of the string. The simplest way to prevent this is to allocate 3 additional bytes at the end of string.
The dwords may be unaligned, but x86 architecture allows access to unaligned data. For small strings, the alignment will take more time than the penalty of unaligned reads.
The code is not portable: you will have to add another 4 conditions if you use a 64-bit processor. For big-endian architectures, the order of conditions should be reversed.
Even if this did not break the aliasing rule the burden placed on the coder to make my_strlen
work is completely unjustifiable. It has been stated several times that strlen
will already be tuned beyond anything the average coder could accomplish.
But an additional statement should be made for C++: Bjarne Stroustrup, the creator of C++, in the last page of chapter 4 in his book: "The C++ Programming Language" says:
Prefer
strings
over C-style strings
You'll find taking the size of a string
far more performant than finding the size of a C-String.
EDIT:
In your comment you say you are working with StaticallyBufferedString which purports to solve string
's "pooling memory model" which causes:
I'd like to suggest C++17's string_view
which like all of C++17 was constructed with multi-threading in mind. It provides the functionality of a string
backed by heap and constexpr
friendly C-Strings. You can even get a jump start on learning about it with namespace experimental
: http://en.cppreference.com/w/cpp/experimental/basic_string_view Unlike your time put in on StaticallyBufferedStrings the knowledge you gain will be perfectly portable and applicable to any future C++ work you do!