I needed to have a helper for splitting a string by a delimiter. So I've wrote this code:
import std;
template <typename T>
concept SplittableString =
std::is_same_v<const char*, std::remove_reference_t<T>> ||
std::is_same_v<std::string, std::remove_reference_t<T>> ||
std::is_same_v<std::string_view, std::remove_reference_t<T>>;
template <SplittableString T>
auto split_str(T&& str, const char* delimiter) -> std::vector<std::string> {
std::vector<std::string> tokens;
std::istringstream iss;
if constexpr (std::is_same_v<std::remove_reference_t<T>, std::string_view)
iss = std::istringstream(std::string {str});
else
iss = std::istringstream(std::forward<T>(str));
std::string token;
while (std::getline(iss, token, *delimiter))
tokens.push_back(token);
return tokens;
}
int main() {
const char* s1 = "I am a pointer to a (const) char";
auto splitted_s1 = split_str(s1, " ");
std::cout << "s1: ";
for (auto& spl : splitted_s1)
std::cout << spl << " ";
std::cout << "\n";
std::string s2 = "I am an std::string!";
auto splitted_s2 = split_str(s2, " ");
std::cout << "s2: ";
for (auto& spl : splitted_s2)
std::cout << spl << " ";
std::cout << "\n";
std::string_view s3 {"I am an std::string_view!"};
auto splitted_s3 = split_str(s3, " ");
std::cout << "s3: ";
for (auto& spl : splitted_s3)
std::cout << spl << " ";
std::cout << "\n";
}
But I am not sure about a lot of things. Neither I am aware about the potential pitfalls of this code?
The unique constraints are that the callee can be a const char*
, an std::string
or an std::string_view
. So, those three types are splittable character types.
Can someone make a review of this code and tell me all the points where I am not doing the things correctly?
Specially, I am not sure about the iss
asignment part in the if constexpr
block.
Here's a live working example: https://godbolt.org/z/j8dnvs8d7
EDIT: Bonus. Why are all the types deduced as lvalue-references when they bind to the str
parameter? Because all parameters on a function are themselves lvalues?
This is a good implementation, but maybe too complex.
You may take advantage of 4 properties:
std::string
has many constructors. It can especially be constructed from other std::string
s, std::string_view
s and also from const char *
std::vector
has also a range constructor. Number 5 in the list. So, it can be constructed by a "begin" / "end" iterator pair.std::string
, or in a negative way, use a delimiter.If we combine all the above, we can come up with a simple lambda, that will do the job for you.
Please see the the following short code example:
#include <iostream>
#include <vector>
#include <regex>
#include <iterator>
#include <string_view>
const std::regex delimiter{ "R( )" };
int main() {
auto split = [&](auto& str) { std::string s(str); std::vector<std::string> v(std::sregex_token_iterator(s.begin(), s.end(), delimiter, -1), {}); return v; };
// Test Data
std::string s{ "aaa bbb ccc" };
std::string_view sv{ "ddd eee fff" };
const char* c = "ggg hhh iii";
// Debug output
for (const auto& p : split(s)) std::cout << p << ' '; std::cout << '\n';
for (const auto& p : split(sv)) std::cout << p << ' '; std::cout << '\n';
for (const auto& p : split(c)) std::cout << p << ' '; std::cout << '\n';
}
Let us have a look.
The lambda will be defined generic, so it can be called with all kind of parameters. In the lambda, we will create a temporary std::string
from the parameter. This will work for all your 3 needed types, because of the existing constructors. So, now we have a std::string
containing the provided data.
Next, we will define a std::vector<std::string>
and and use its constructor no 5. So, we will provide a begin-iterator and an end-iterator.
The iterator that we will use is the std::sregex_token_iterator. This iterator has also some constructors. We can survive with No 1 and No 2.
No 1 ist the default constructor. And as you can read in the CPP-Reference
Default constructor. Constructs the end-of-sequence iterator.
No. 2 takes the begin
- and end
-iterator of our temporary string, a regex, and then -1 as the submatch
. Again from the CPP-reference:
submatch - the index of the submatch that should be returned. "0" represents the entire match, and "-1" represents the parts that are not matched (e.g, the stuff between matches).
So, now we know, how to use the both iterators. If we construct the std::vector
we give it:
std::sregex_token_iterator(s.begin(), s.end(), delimiter, -1)
as the begin iterator ans {}
as the default, so the end-iterator.
Now, we have all the data that we need in our vector, just be defining it and using its constructor.
And that's it. We return the std::vector
with the string parts and can work with them.