Search code examples
c++clangcompiler-optimizationstdstringlibc++

Why is initializing a string to "" more efficient than the default constructor?


Generally, the default constructor should be the fastest way of making an empty container. That's why I was surprised to see that it's worse than initializing to an empty string literal:

#include <string>

std::string make_default() {
    return {};
}

std::string make_empty() {
    return "";
}

This compiles to: (clang 16, libc++)

make_default():
        mov     rax, rdi
        xorps   xmm0, xmm0
        movups  xmmword ptr [rdi], xmm0
        mov     qword ptr [rdi + 16], 0
        ret
make_empty():
        mov     rax, rdi
        mov     word ptr [rdi], 0
        ret

See live example at Compiler Explorer.

Notice how returning {} is zeroing 24 bytes in total, but returning "" is only zeroing 2 bytes. How come return ""; is so much better?


Solution

  • This is an intentional decision in libc++'s implementation of std::string.

    First of all, std::string has so-called Small String Optimization (SSO), which means that for very short (or empty) strings, it will store their contents directly inside of the container, rather than allocating dynamic memory. That's why we don't see any allocations in either case.

    In libc++, the "short representation" of a std::string consists of:

    Size (x86_64) Meaning
    1 bit "short flag" indicating that it is a short string (zero means yes)
    7 bits length of the string, excluding null terminator
    0 bytes padding bytes to align string data (none for basic_string<char>)
    23 bytes string data, including null terminator

    For an empty string, we only need to store two bytes of information:

    • one zero-byte for the "short flag" and the length
    • one zero-byte for the null terminator

    The constructor accepting a const char* will only write these two bytes, the bare minimum. The default constructor "unnecessarily" zeroes all 24 bytes that the std::string contains. This may be better overall though, because it makes it possible for the compiler to emit std::memset or other SIMD-parallel ways of zeroing arrays of strings in bulk.

    For a full explanation, see below:

    Initializing to "" / Calling string(const char*)

    To understand what happens, let's look at the libc++ source code for std::basic_string:

    // constraints...
    /* specifiers... */ basic_string(const _CharT* __s)
      : /* leave memory indeterminate */ {
        // assert that __s != nullptr
        __init(__s, traits_type::length(__s));
        // ...
      }
    

    This ends up calling __init(__s, 0), where 0 is the length of the string, obtained from std::char_traits<char>:

    // template head etc...
    void basic_string</* ... */>::__init(const value_type* __s, size_type __sz)
    {
        // length and constexpr checks
        pointer __p;
        if (__fits_in_sso(__sz))
        {
            __set_short_size(__sz); // set size to zero, first byte
            __p = __get_short_pointer();
        }
        else
        {
            // not entered
        }
        traits_type::copy(std::__to_address(__p), __s, __sz); // copy string, nothing happens
        traits_type::assign(__p[__sz], value_type()); // add null terminator
    }
    

    __set_short_size will end up writing only a single byte, because the short representation of a string is:

    struct __short
    {
        struct _LIBCPP_PACKED {
            unsigned char __is_long_ : 1; // set to zero when active
            unsigned char __size_ : 7;    // set to zero for empty string
        };
        char __padding_[sizeof(value_type) - 1]; // zero size array
        value_type __data_[__min_cap]; // null terminator goes here
    };
    

    After compiler optimizations, zeroing __is_long_, __size_, and one byte of __data_ compiles to:

    mov word ptr [rdi], 0
    

    Initializing to {} / Calling string()

    The default constructor is more wasteful by comparison:

    /* specifiers... */ basic_string() /* noexcept(...) */
      : /* leave memory indeterminate */ {
        // ...
        __default_init();
    }
    

    This ends up calling __default_init(), which does:

    /* specifiers... */ void __default_init() {
        __r_.first() = __rep(); // set representation to value-initialized __rep
        // constexpr-only stuff...
    }
    

    Value-initialization of a __rep() results in 24 zero bytes, because:

    struct __rep {
        union {
            __long  __l; // first union member gets initialized,
            __short __s; // __long representation is 24 bytes large
            __raw   __r;
        };
    };
    

    Conclusion

    If you want to value-initialize everywhere for the sake of consistency, don't let this keep you from it. Zeroing out a few bytes unnecessarily isn't a big performance problem you need to worry about.

    In fact, it is helpful when initializing large quantities of strings, because std::memset may be used, or some other SIMD way of zeroing out memory.