Search code examples
c++c++17string-concatenationstring-view

Concatenating string_view objects


I've been adding std::string_views to some old code for representing string like config params, as it provides a read only view, which is faster due to no need for copying.

However, one cannot concatenate two string_view together as the operator+ isn't defined. I see this question has a couple answers stating its an oversight and there is a proposal in for adding that in. However, that is for adding a string and a string_view, presumably if that gets implemented, the resulting concatenation would be a std::string

Would adding two string_view also fall in the same category? And if not, why shouldn't adding two string_view be supported?

Sample

std::string_view s1{"concate"};
std::string_view s2{"nate"};
std::string_view s3{s1 + s2};

And here's the error

error: no match for 'operator+' (operand types are 'std::string_view' {aka 'std::basic_string_view<char>'} and 'std::string_view' {aka 'std::basic_string_view<char>'})

Solution

  • A std::string_view is an alias for std::basic_string_view<char>, which is a std::basic_string_view templated on a specific type of character, i.e. char.

    But what does it look like?

    Beside the fairly large number of useful member functions such as find, substr, and others (maybe it's an ordinary number, if compared to other container/string-like things offered by the STL), std::basic_string_view<_CharT>, with _CharT being the generic char-like type, has just 2 data members,

    // directly from my /usr/include/c++/12.2.0/string_view
          size_t        _M_len;
          const _CharT* _M_str;
    

    i.e. a constant pointer to _CharT to indicate where the view starts, and a size_t (an appropriate type of number) to indicate how long the view is starting from _M_str's pointee.

    In other words, a string view just knows where it starts and how long it is, so it represents a sequence of char-like entities which are contiguous in memory. With just two such members, you can't represent a string which is made up of non-contiguous substrings.

    Yet in other words, if you want to create a std::string_view, you need to be able to tell how many chars it is long and from which position. Can you tell where s1 + s2 would have to start and how many characters it should be long? Think about it: you can't, becase s1 and s2 are not adjacent.

    Maybe a diagram can help.

    Assume these lines of code

    std::string s1{"hello"};
    std::string s2{"world"};
    

    s1 and s2 are totally unrelated objects, as far as their memory location is concerned; here is what they looks like:

                               &s2[0]
                                 |
                                 | &s2[1]
                                 |   |
    &s1[0]                       |   | &s2[2]
      |                          |   |   |
      | &s1[1]                   |   |   | &s2[3]
      |   |                      |   |   |   |
      |   | &s1[2]               |   |   |   | &s2[4]
      |   |   |                  |   |   |   |   |
      |   |   | &s1[3]           v   v   v   v   v
      |   |   |   |            +---+---+---+---+---+
      |   |   |   | &s1[4]     | w | o | r | l | d |
      |   |   |   |   |        +---+---+---+---+---+
      v   v   v   v   v
    +---+---+---+---+---+
    | h | e | l | l | o |
    +---+---+---+---+---+
    

    I've intentionally drawn them misaligned to mean that &s1[0], the memory location where s1 starts, and &s2[0], the memory location where s2 starts, have nothing to do with each other.

    Now, imagine you create two string views like this:

    std::string_view sv1{s1};
    std::string_view sv2(s2.begin() + 1, s2.begin() + 4);
    

    Here's what they will look like, in terms of the two implementation-defined members _M_str and _M_len:

                                    &s2[0]
                                      |
                                      | &s2[1]
                                      |   |
         &s1[0]                       |   | &s2[2]
           |                          |   |   |
           | &s1[1]                   |   |   | &s2[3]
           |   |                      |   |   |   |
           |   | &s1[2]               |   |   |   | &s2[4]
           |   |   |                  |   |   |   |   |
           |   |   | &s1[3]           v   v   v   v   v
           |   |   |   |            +---+---+---+---+---+
           |   |   |   | &s1[4]     | w | o | r | l | d |
           |   |   |   |   |        +---+---+---+---+---+
           v   v   v   v   v            · ^         ·
         +---+---+---+---+---+          · |         ·
         | h | e | l | l | o |        +---+         ·
         +---+---+---+---+---+        | ·           ·
         · ^                 ·        | · s2._M_len ·
         · |                 ·        | <----------->
       +---+                 ·        |
       | ·                   ·        +-- s2._M_str
       | ·       s1._M_len   ·
       | <------------------->
       |
       +-------- s1._M_str
    

    Given the above, can you see what's wrong with expecting that

    std::string_view s3{s1 + s2};
    

    works?

    How can you possible define s3._M_str and s3._M_len (based on s1._M_str, s1._M_len, s2._M_str, and s2._M_len), such that they represent a view on "helloworld"?

    You can't because "hello" and "world" are located in two unrelated areas of memory.