Search code examples
c++c++17language-lawyerstring-literals

c++17 User-defined literals strange behavior for sequence of strings


I spent some time while diagnosting the bug where I lost the comma between two strings created with std::string_literals.

EDIT As soon a lot of commenters address to really clear case of compiler like "one" "two" that is single string, I need to emphasise using of user literal for this purpose I've added 2 extra operators: _kylobyte and my_ns::s (I know it is bad not to use '_').

EDIT 2 Okay, @user17732522 argues to fundamental lex behaviour. So I bring one more user-defined-string-literal boom. Compiler behavior that it collects together similar user defined but works as expected when 2 different is in use. So not only lex should be taken to account.

Minimal reproducible code is following:

#include <iostream>

template <class ...Tx>
void consume_all(Tx&&  ...tx)
{
    ((std::cout << tx),...);
}
namespace my_ns
{
    inline std::string operator "" s( const char* str, size_t n)
    {
        return std::string{str, n} + "+";
    }
    inline std::string operator "" _boom( const char* str, size_t n)
    {
        return std::string{str, n} + "*";
    }
}
inline  std::string operator "" _kylobyte( unsigned long long int c)
{
    return std::to_string(c*1000);
}
int main()
{
    using namespace my_ns; //PAY attention not std::string_literals;
    consume_all(
        "abc"s // comma is missed there
        "123"s
        , //without this boom fails
        "123"_boom
        "abc"_boom  //still doesn't need comma
        , // compiler needs this
        12_kylobyte
    );
}

Code is compiled without any errors on MSVC 2022 and Clang 6.0. So my question: what the part of the specification describes this behavior?


Solution

  • This is not specific to user-defined string literals.

    You can (and always could, already in C) concatenate multiple string literals by writing them directly after one another, possibly separated by whitespace.

    The user-defined literal suffixes must not conflict (i.e. there must not be two different ones) but otherwise it works with user-defined literals the same way. (See [lex.string]/8 together with [lex.ext]/8.)

            "abc"s // comma is missed there
            "123"s
    

    is exactly equivalent to

            "abc123"s
    

    The concatenation is done after preprocessing directives are executed, but before preprocessing tokens are converted into tokens and any other analysis of the source code happens, in particular before the string literal is rewritten into a call to the user-defined string literal operator. See translation phase 6 in [lex.phases]/1.6


    12_kylobyte is not a (user-defined) string literal and there is therefore no concatenation rule that would apply to it. If you leave out the comma between "123"s and 12_kylobyte, then you would attempt to parse one literal followed by another in an expression, which the expression grammar does not permit.