Search code examples
c++optimizationconstructorheap-memorystdstring

Are two heap allocations more expensive than a call to std::string fill ctor?


I want to have a string with a capacity of 131 chars (or bytes). I know two simple ways of achieving that. So which of these two code blocks is faster and more efficient?

std::string tempMsg( 131, '\0' ); // constructs the string with a 131 byte buffer from the start
tempMsg.clear( ); // clears those '\0' chars to free space for the actual data

tempMsg += "/* some string literals that are appended */";

or this one:

std::string tempMsg; // default constructs the string with a 16 byte buffer
tempMsg.reserve( 131 ); // reallocates the string to increase the buffer size to 131 bytes??

tempMsg += "/* some string literals that are appended */";

I guess the first approach only uses 1 allocation and then sets all those 131 bytes to 0 ('\0') and then clears the string (std::string::clear is generally constant according to: https://www.cplusplus.com/reference/string/string/clear/). The second approach uses 2 allocations but on the other hand, it doesn't have to set anything to '\0'. But I've also heard about compilers allocating 16 bytes on the stack for a string object for optimization purposes. So the 2nd method might use only 1 heap allocation as well.

So is the first method faster than the other one? Or are there any other better methods?


Solution

  • The most accurate answer is that it depends. The most probable answer is the second being faster or as fast. Calling the fill ctor requires not only a heap allocation but a fill (typically translates to a memset in my experience).

    clear usually won't do anything with a POD char besides setting a first pointer or size integer to zero because char is a trivially-destructible type. There's no loop involved with clear usually unless you create std::basic_string with a non-trivial UDT. It's constant-time otherwise and dirt-cheap in practically every standard library implementation.

    Edit: An Important Note: I never encountered a standard lib implementation that does this or it has slipped my memory (very possible as I think I'm turning senile), but there is something very important that Viktor Sehl pointed out to me that I was very ignorant about in the comments:

    Please note that std::string::clear() on some implementations free the allocated memory (if there are any), unlike a std::vector. –

    That would actually make your first version involve two heap allocations. But the second should still only be one (opposite of what you thought).

    Resumed:

    But I've also heard about compilers allocating 16 bytes on the stack for a string object for optimization purposes. So the 2nd method might use only 1 heap allocation as well.

    Small Buffer Optimizations

    The first allocation is a small-buffer stack optimization for implementations that use it (technically not always stack, but it'll avoid additional heap allocations). It's not separately heap-allocated and you can't avoid it with a fill ctor (the fill ctor will still allocate the small buffer). What you can avoid is filling the entire array with '\0' before you fill it with what you actually want, and that's why the second version is likely faster (marginally or not depending on how many times you invoke it from a loop). That's needless overhead unless the optimizer eliminates it for you, and it's unlikely in my experience that optimizers will do that in loopy cases that can't be optimized with something like SSA.

    I just pitched in here because your second version is also clearer in intent than filling a string with something as an attempted optimization (in this case a very possibly misguided one if you ask me) only to throw it out and replace it with what you actually want. The second is at least clearer in intent and almost certainly as fast or faster in most implementations.

    On Profiling

    I would always suggest measuring though if in doubt, and especially before you start attempting funny things like in your first example. I can't recommend the profiler enough if you're in working in performance-critical fields. The profiler will not only answer this question for you but it'll also teach you to refrain from writing such counter-intuitive code like in the first example except in places where it makes a real positive difference (in this case I think the difference is actually negative or neutral). From my perspective, the use of both profiler and debugging should be something ideally taught in CS 101. The profiler helps mitigate the dangerous tendency for people to optimize the wrong things very counter-productively. They tend to be very easy to use; you just run them and make your code perform the expensive operation you want to optimize and you get back nice results like so:

    enter image description here

    If the small buffer optimization confuses you a bit, a simple illustration is like this:

    struct SomeString
    {
        // Pre-allocates (always) some memory in advance to avoid additional
        // heap allocs.
        char small_buffer[some_small_fixed_size] = {};
    
        // Will point to small buffer until string gets large.
        char* ptr = small_buffer;
    };
    

    The allocation of the small buffer is unavoidable, but it doesn't require separate calls to malloc/new/new[]. And it's not allocated separately on the heap from the string object itself (if it is allocated on heap). So both of the examples that you showed involve, at most, a single heap allocation (unless your standard library implementation is FUBAR -- edit: or one that Viktor is using). What the first example has conceptually on top of that is a fill/loop (could be implemented as a very efficient intrinsic in assembly but loopy/linear time stuff nevertheless) unless the optimizer eliminates it.

    String Optimization

    So is the first method faster than the other one? Or are there any other better methods?

    You can write your own string type which uses an SBO with, say, 256 bytes for the small buffer which is typically going to be much larger than any std::string optimization. Then you can avoid heap allocations entirely for your 131-length case.

    template <class Char, size_t SboSize=256>
    class TempString
    {
    private:
        // Stores the small buffer.
        Char sbo[SboSize] = {};
    
        // Points to the small buffer until num > SboSize.
        Char* ptr = sbo;
    
       // Stores the length of the string.
       size_t num = 0;
    
       // Stores the capacity of the string.
       size_t cap = SboSize;
    
    public:
        // Destroys the string.
        ~TempString()
        {
            if (ptr != sbo)
                delete[] ptr;
        }
    
        // Remaining implementation left to reader. Note that implementing
        // swap requires swapping the contents of the SBO if the strings
        // point to them rather than swapping pointers (swapping is a
        // little bit tricky with SBOs involved, so be wary of that).
    };
    

    That would be ill-suited for persistent storage though because it would blow up memory use (ex: requiring 256+ bytes just to store a string with one character in it) if you stored a bunch of strings persistently in a container. It's well-suited for temporary strings though you transfer into and out of function calls. I'm primarily a gamedev so rolling our own alternatives to the standard C++ library is quite normal here given our requirements for real-time feedback with high graphical fidelity. I wouldn't recommend it for the faint-hearted though, and definitely not without a profiler. This is a very practical and viable option in my field although it might be ridiculous in yours. The standard lib is excellent but it's tailored for the needs of the entire world. You can usually beat it if you can tailor your code very specifically to your needs and produce more narrowly-applicable code.

    Actually, even std::string with SBOs is rather ill-suited for persistent storage anyway and not just TempString above because if you store like std::unordered_map<std::string, T> and std::string uses a 16-byte SBO inflating sizeof(std::string) to 32 bytes or more, then your keys will require 32 bytes even if they just store one character fitting only two strings or less in a single cache line on traversal of the hash table. That's a downside to using SBOs. They can blow up your memory use for persistent storage that's part of your application state. But they're excellent for temporaries whose memory is just pushed and popped to/from stack in a LIFO alloc/dealloc pattern which only requires incrementing and decrementing a stack pointer.

    If you want to optimize the storage of many strings though from a memory standpoint, then it depends a lot on your access patterns and needs. However, a fairly simple solution is like so if you want to just build a dictionary and don't need to erase specific strings dynamically:

    // Just using a struct for simplicity of illustration:
    struct MyStrings
    {
        // Stores all the characters for all the null-terminated strings.
        std::vector<char> buffer;
    
        // Stores the starting index into the buffer for the nth string.
        std::vector<std::size_t> string_start;
    
        // Inserts a null-terminated string to the buffer.
        void insert(const std::string_view str)
        {
            string_start.push_back(buffer.size());
            buffer.insert(buffer.end(), str.begin(), str.end());
            buffer.push_back('\0');
        }
    
        // Returns the nth null-terminated string.
        std::string_view operator[](int32_t n) const
        {
            return {buffer.data() + string_start[n]};
        }
    };
    

    Another common solution that can be very useful if you store a lot of duplicate strings in an associative container or need fast searches for strings that can be looked up in advance is to use string interning. The above solution can also be combined to implement an efficient way to store all the interned strings. Then you can store lightweight indices or pointers to your interned strings and compare them immediately for equality, e.g., without involving any loops, and store many duplicate references to strings that only cost the size of an integer or pointer.