Search code examples
c++stringtemplatessplitrvalue-reference

Reviewing the implementation of a function that splits character types


I needed to have a helper for splitting a string by a delimiter. So I've wrote this code:

import std;

template <typename T>
concept SplittableString =
    std::is_same_v<const char*, std::remove_reference_t<T>> ||
    std::is_same_v<std::string, std::remove_reference_t<T>> ||
    std::is_same_v<std::string_view, std::remove_reference_t<T>>;


template <SplittableString T>
auto split_str(T&& str, const char* delimiter) -> std::vector<std::string> {
    std::vector<std::string> tokens;

    std::istringstream iss;
    if constexpr (std::is_same_v<std::remove_reference_t<T>, std::string_view)
        iss = std::istringstream(std::string {str});
    else
        iss = std::istringstream(std::forward<T>(str));

    std::string token;
    while (std::getline(iss, token, *delimiter))
        tokens.push_back(token);

    return tokens;
}


int main() {
    const char* s1 = "I am a pointer to a (const) char";
    auto splitted_s1 = split_str(s1, " ");
    std::cout << "s1: ";
    for (auto& spl : splitted_s1)
        std::cout << spl << " ";
    std::cout << "\n";

    std::string s2 = "I am an std::string!";
    auto splitted_s2 = split_str(s2, " ");
    std::cout << "s2: ";
    for (auto& spl : splitted_s2)
        std::cout << spl << " ";
    std::cout << "\n";

    std::string_view s3 {"I am an std::string_view!"};
    auto splitted_s3 = split_str(s3, " ");
    std::cout << "s3: ";
    for (auto& spl : splitted_s3)
        std::cout << spl << " ";
    std::cout << "\n";
}

But I am not sure about a lot of things. Neither I am aware about the potential pitfalls of this code?

The unique constraints are that the callee can be a const char*, an std::string or an std::string_view. So, those three types are splittable character types.

Can someone make a review of this code and tell me all the points where I am not doing the things correctly?

Specially, I am not sure about the iss asignment part in the if constexpr block.

Here's a live working example: https://godbolt.org/z/j8dnvs8d7

EDIT: Bonus. Why are all the types deduced as lvalue-references when they bind to the str parameter? Because all parameters on a function are themselves lvalues?


Solution

  • This is a good implementation, but maybe too complex.

    You may take advantage of 4 properties:

    1. std::string has many constructors. It can especially be constructed from other std::strings, std::string_views and also from const char *
    2. a std::vector has also a range constructor. Number 5 in the list. So, it can be constructed by a "begin" / "end" iterator pair.
    3. There is a dedicated iterator for iterating over patterns in a std::string, or in a negative way, use a delimiter.
    4. We can write generic lambdas

    If we combine all the above, we can come up with a simple lambda, that will do the job for you.

    Please see the the following short code example:

    #include <iostream>
    #include <vector>
    #include <regex>
    #include <iterator>
    #include <string_view>
    
    const std::regex delimiter{ "R( )" };
    
    int main() {
        auto split = [&](auto& str) { std::string s(str);  std::vector<std::string> v(std::sregex_token_iterator(s.begin(), s.end(), delimiter, -1), {}); return v; };
    
        // Test Data
        std::string s{ "aaa bbb ccc" };
        std::string_view sv{ "ddd eee fff" };
        const char* c = "ggg hhh iii";
    
        // Debug output
        for (const auto& p : split(s)) std::cout << p << ' '; std::cout << '\n';
        for (const auto& p : split(sv)) std::cout << p << ' '; std::cout << '\n';
        for (const auto& p : split(c)) std::cout << p << ' '; std::cout << '\n';
    }
    

    Let us have a look.

    The lambda will be defined generic, so it can be called with all kind of parameters. In the lambda, we will create a temporary std::string from the parameter. This will work for all your 3 needed types, because of the existing constructors. So, now we have a std::string containing the provided data.

    Next, we will define a std::vector<std::string> and and use its constructor no 5. So, we will provide a begin-iterator and an end-iterator.

    The iterator that we will use is the std::sregex_token_iterator. This iterator has also some constructors. We can survive with No 1 and No 2.

    No 1 ist the default constructor. And as you can read in the CPP-Reference

    Default constructor. Constructs the end-of-sequence iterator.

    No. 2 takes the begin- and end-iterator of our temporary string, a regex, and then -1 as the submatch. Again from the CPP-reference:

    submatch - the index of the submatch that should be returned. "0" represents the entire match, and "-1" represents the parts that are not matched (e.g, the stuff between matches).

    So, now we know, how to use the both iterators. If we construct the std::vector we give it:

    std::sregex_token_iterator(s.begin(), s.end(), delimiter, -1) as the begin iterator ans {} as the default, so the end-iterator.

    Now, we have all the data that we need in our vector, just be defining it and using its constructor.

    And that's it. We return the std::vector with the string parts and can work with them.