Search code examples
c++std-rangesc++23

Behavior differences using string objects as delimiters in ranges::views::split


I've been testing out the ranges library with both GCC 13 and Clang 17 compilers and came across an unexpected result when trying out the split function for views.

I checked cppreference and noticed in the notes section for split_view where it says

The delimiter pattern generally should not be an ordinary string literal, as it will consider the null terminator to be necessary part of the delimiter; therefore, it is advisable to use a std::string_view literal instead.

but was wondering if the issue with string literals also applied to string objects and came across some inconsistent results.

Here is the exact code I tested:

#include <algorithm>
#include <iostream>
#include <ranges>
#include <string_view>

int main()
{
    auto print_results = [](const auto& x)
    {
        std::cout << std::string_view(x) << " ";
    };

    const std::string input("A1 A2 A3 A4\nB1 B2 B3 B4\nC1 C2 C3 C4\n");
    const std::string new_line("\n");

    const std::string whitespace(" ");
    auto GOOD_remove_whitespaces_v1 = [&whitespace](const auto& x)
    {   
        return std::views::split(x, whitespace);
    };

    std::ranges::for_each(input | std::views::split(new_line)
                                | std::views::transform(GOOD_remove_whitespaces_v1)
                                | std::views::join
                                | std::views::drop(1)
                                | std::views::stride(2), print_results);
    std::cout << " -> good v1\n";   


    auto GOOD_remove_whitespaces_v2 = [&whitespace](const auto& x)
    {   
        return std::views::split(x, std::string(" "));
    };

    std::ranges::for_each(input | std::views::split(new_line)
                                | std::views::transform(GOOD_remove_whitespaces_v2)
                                | std::views::join
                                | std::views::drop(1)
                                | std::views::stride(2), print_results);
    std::cout << " -> good v2\n";
                                

    // this produces unexpected results
    auto BAD_remove_whitespaces_v1 = [](const auto& x)
    {   
        const std::string in_whitespace(" ");
        return std::views::split(x, in_whitespace);
    };

    std::ranges::for_each(input | std::views::split(new_line)
                                | std::views::transform(BAD_remove_whitespaces_v1)
                                | std::views::join
                                | std::views::drop(1)
                                | std::views::stride(2), print_results);                                                   
    std::cout << " -> bad v2\n";

    return 0;
}

and here is the output:

A2 A4 B2 B4 C2 C4  -> good v1
A2 A4 B2 B4 C2 C4  -> good v2
B1 B2 B3 B4  -> bad v2

Clearly the preferred way is to use string views for delimiters as cppreference advised but I'm curious as to why there's this unexpected result with string objects. I'm uncertain if it's the null terminator issue mentioned in the notes for split_view since this seems to work depending on where and how the string object is created. Is there some implicit casting to a string view happening somewhere that makes the first two for_each calls work or is that just part of some possibly undefined behavior?

Thanks in advance.


Solution

  • Your last attempt is undefined behavior.

    Since pattern range in_whitespace is an lvalue, it will be held by split_view as a reference (which will be wrapped by ref_view). Because it is a local variable, it will be destroyed when lambda returns, which causes a dangling reference.

    Declare it as static to extend its life cycle and you will get consistent results.

    The reason why there is no issue with the second attempt is that std::string(" ") is a prvalue, so it will be moved into split_view in the form of owning_view, so split_view saves the pattern range by value.