Search code examples
c++stringstdstringstring-literalsc-strings

Default advice for using C-style string literals vs. constructing unnamed std::string objects?


So C++ 14 introduced a number of user-defined literals to use, one of which is the "s" literal suffix, for creating std::string objects. According to the documentation, its behavior is exactly the same as constructing an std::string object, like so:

auto str = "Hello World!"s; // RHS is equivalent to: std::string{ "Hello World!" }

Of course constructing an unnamed std::string object could be done prior to C++ 14, but because the C++ 14 way is so much simpler, I think way more people will actually consider constructing std::string objects on the spot than before, that's why I thought it makes sense to ask this.

So my question is simple: In what cases it's a good (or bad) idea construct an unnamed std::string object, instead of simply using a C-style string literal?


Example 1:

Consider the following:

void foo(std::string arg);

foo("bar");  // option 1
foo("bar"s); // option 2

If I'm correct, the first method will call the appropriate constructor overload of std::string to create an object inside foo's scope, and the second method will construct an unnamed string object first, and then move-construct foo's argument from that. Although I'm sure that compilers are very good at optimizing stuff like this, but still, the second version seems like it involves an extra move, as opposed to the first alternative (not like a move is expensive of course). But again, after compiling this with a reasonable compiler, the end results are most likely to be highly optimized, and free of redundand moves/copies anyway.

Also, what if foo is overloaded to accept rvalue references? In that case, I think it would make sense to call foo("bar"s), but I could be wrong.


Example 2:

Consider the following:

std::cout << "Hello World!" << std::endl;  // option 1
std::cout << "Hello World!"s << std::endl; // option 2

In this case, the std::string object is probably passed to cout's operator via rvalue reference, and the first option passes a pointer probably, so both are very cheap operations, but the second one has the extra cost of constructing an object first. It's probably a safer way to go though (?).


In all cases of course, constructing an std::string object could result in a heap allocation, which could throw, so exception safety should be taken into consideration as well. This is more of an issue in the second example though, as in the first example, an std::string object will be constructed in both cases anyway. In practice, getting an exception from constructing a string object is very unlikely, but still could be a valid argument in certain cases.

If you can think of more examples to consider, please include them in your answer. I'm interested in a general advice regarding the usage of unnamed std::string objects, not just these two particular cases. I only included these to point out some of my thoughts regarding this topic.

Also, if I got something wrong, feel free to correct me as I'm not by any means a C++ expert. The behaviors I described are only my guesses on how things work, and I didn't base them on actual research or experimenting really.


Solution

  • In what cases it's a good (or bad) idea construct an unnamed std::string object, instead of simply using a C-style string literal?

    A std::string- literal is a good idea when you specifically want a variable of type std::string, whether for

    • modifying the value later (auto s = "123"s; s += '\n';)

    • the richer, intuitive and less error-prone interface (value semantics, iterators, find, size etc)

      • value semantics means ==, < copying etc. work on the values, unlike the pointer/by-reference semantics after C-string literals decay to const char*s
    • calling some_templated_function("123"s) would concisely ensure a <std::string> instantiation, with the argument being able to be handled using value semantics internally

      • if you know other code's instantiating the template for std::string anyway, and it's of significant complexity relative to your resource constraints, you might want to pass a std::string too to avoid unnecessarily instantiation for const char* too, but it's rare to need to care
    • values containing embedded NULs

    A C-style string literal might be preferred where:

    • pointer-style semantics are wanted (or at least not a problem)

    • the value's only going to be passed to functions expecting const char* anyway, or std::string temporaries will get constructed anyway and you don't care that you're giving your compiler optimiser one extra hurdle to leap to achieve compile or load time construction if there's potential to reuse the same std::string instance (e.g. when passing to functions by const-reference) - again it's rare to need to care.

    • (another rare and nasty hack) you're somehow leveraging your compiler's string pooling behaviour, e.g. if it guarantees that for any given translation unit the const char* to string literals will only (but of course always) differ if the text differs

      • you can't really get the same from std::string .data()/.c_str(), as the same address may be associated with different text (and different std::string instances) during the program execution, and std::string buffers at distinct addresses may contain the same text
    • you benefit from having the pointer remain valid after a std::string would leave scope and be destroyed (e.g. given enum My_Enum { Zero, One }; - const char* str(My_Enum e) { return e == Zero ? "0" : "1"; } is safe, but const char* str(My_Enum e) { return e == Zero ? "0"s.c_str() : "1"s.c_str(); } isn't and std::string str(My_Enum e) { return e == Zero ? "0"s : "1"s; } smacks of premature pessimism in always using dynamic allocation (sans SSO, or for longer text))

    • you're leveraging compile-time concatenation of adjacent C-string literals (e.g. "abc" "xyz" becomes one contiguous const char[] literal "abcxyz") - this is particularly useful inside macro substitutions

    • you're memory constrained and/or don't want to risk an exception or crash during dynamic memory allocation

    Discussion

    [basic.string.literals] 21.7 lists:

    string operator "" s(const char* str, size_t len);

    Returns: string{str,len}

    Basically, using ""s is calling a function that returns a std::string by value - crucially, you can bind a const reference, or rvalue reference, but not an lvalue reference.

    When used to call void foo(std::string arg);, arg will be indeed be move constructed.

    Also, what if foo is overloaded to accept rvalue references? In that case, I think it would make sense to call foo("bar"s), but I could be wrong.

    Doesn't matter much which you choose. Maintenance wise - if foo(const std::string&) is ever changed to foo(const char*), only foo("xyz"); invocations will seamlessly continue working, but there are very few vaguely plausible reasons it might be (so C code could call it too? - but still it'd be a bit mad not to continue to provide a foo(const std::string&) overload for existing client code; so it could be implemented in C? - perhaps; removing dependency on the <string> header? - irrelevant with modern computing resources).

    std::cout << "Hello World!" << std::endl; // option 1

    std::cout << "Hello World!"s << std::endl; // option 2

    The former will call operator<<(std::ostream&, const char*), directly accessing the constant string literal data, with the only disadvantage being that the streaming may have to scan for the terminating NUL. "option 2" would match a const-reference overload and implies construction of a temporary, though compilers might be able to optimise it so they're not doing that unnecessarily often, or even effectively create the string object at compile time (which might only be practical for strings short enough to use an in-object Short String Optimisation (SSO) approach). If they're not doing such optimisations already, the potential benefit and hence pressure/desire to do so is likely to increase.