I am splitting my string based on two delimiters so far, but I would like to extend this to a possibility where the number of delimiters is variable. Right now, I have this function:
void dac_sim::dac_ifs::dac_sim_subcmd_if::parse_cmd(std::string command, std::array<std::string, 2> delimiters)
{
std::string str = command;
std::vector< std::string > vec;
auto it = str.begin(), end = str.end();
bool res = boost::spirit::qi::parse(it, end,
boost::spirit::qi::as_string[ *(boost::spirit::qi::char_ - delimiters[0] - delimiters[1]) ] % (boost::spirit::qi::lit(delimiters[0]) | boost::spirit::qi::lit(delimiters[1])),
vec);
std::cout << "Parsed:";
for (auto const& s : vec)
std::cout << " \"" << s << "\"";
std::cout << std::endl;
}
But now I want something more generic, via template for the array size, like this:
template <size_t N>
void dac_sim::dac_ifs::dac_sim_subcmd_if::parse_cmd(std::string command, std::array<std::string, N> delimiters)
In this case, how can I procceed?
Can you use c++17? I'd use fold-expressions:
auto parse_cmd(std::string_view str, auto const&... delim) {
namespace qi = boost::spirit::qi;
std::vector<std::string> vec;
qi::parse(str.begin(), str.end(),
qi::as_string[*(qi::char_ - ... - delim)] % (qi::lit(delim) | ...) //
> qi::eoi,
vec);
return vec;
}
Test it Live On Coliru
for (auto input :
{
"",
"|",
"|,",
"|,||",
"foo||bar,qux,stux;net||more||||to,come",
}) //
{
fmt::print("{:<30} -> {}\n", fmt::format("'{}'", input), parse_cmd(input, "||", ","));
}
Prints
'' -> [""]
'|' -> ["|"]
'|,' -> ["|", ""]
'|,||' -> ["|", "", ""]
'foo||bar,qux,stux;net||more||||to,come' -> ["foo", "bar", "qux", "stux;net", "more", "", "to", "come"]
You can always use the index-sequence trick to transform into a parameter pack:
template <size_t N>
auto parse_cmd(std::string_view str, std::array<std::string, N> const& delims) {
return [&]<size_t... I>(std::index_sequence<I...>) {
return do_parse_cmd(str, delims[I]...);
}(std::make_index_sequence<N>{});
}
Where do_parse_cmd
is the function just shown above. Let's demo with ";"
added as a third delimiter: Live On Coliru
std::array<std::string, 3> delimiters{"||", ",", ";"};
for (auto input :
{
"",
"|",
"|,",
"|,||",
"foo||bar,qux,stux;net||more||||to,come",
}) //
{
fmt::print("{:<15} -> {}\n", fmt::format("'{}'", input), parse_cmd(input, delimiters));
}
Prints
'' -> [""]
'|' -> ["|"]
'|,' -> ["|", ""]
'|,||' -> ["|", "", ""]
'foo||bar,qux,stux;net||more||||to,come' -> ["foo", "bar", "qux", "stux", "net", "more", "", "to", "come"]
Note how
stux;net
is correctly split now.
For one, the above requires c++17 for the fold-expressions, and the demos also liberally use c++20 features to make it all easy to demonstrate. If you don't have that, even the c++17 version will become a lot more tedious.
There's an issue when the caller passes delimiters in a sub-optimal way. E.g., {":", ":|:"}
won't work, but {":|:", ":"}
will. That's because of the overlapping pattern. You would want to be smarter.
You might want to be able to have full-blown parser expression capability instead of fixed string literals. Let me postpone this for later
To support c++11 and solve the semantic issue, let's use qi::symbols
:
using tokens = std::vector<std::string>;
template <size_t N> tokens
parse_cmd(std::string const& str, std::array<std::string, N> const& delims) {
namespace qi = boost::spirit::qi;
qi::symbols<char> delim;
for (auto& d : delims)
delim += d;
tokens vec;
parse(str.begin(), str.end(), qi::as_string[*(qi::char_ - delim)] % delim > qi::eoi, vec);
return vec;
}
This internally builds a Trie so the order in which delimiters are passed doesn't matter. The longest possible match will always match a single delim
expression.
With the same test: Live On Coliru (c++11)
'' -> [""]
'|' -> ["|"]
'|,' -> ["|", ""]
'|,||' -> ["|", "", ""]
'foo||bar,qux,stux;net||more||||to,come' -> ["foo", "bar", "qux", "stux", "net", "more", "", "to", "come"]
To be completely flexible and compose the parser from any parser expression, you would have to thread the needle in Qi, and get considerable compile times:
Suffice it to say, I won't recommend it. However, using X3¹ none of this is hard, and you could easily achieve it
Live On Coliru. 'Nuff said
Basically replacing std::string
with auto
in the fold-expression variant:
auto parse_cmd(std::string const& str, auto... delims) {
tokens vec;
parse(str.begin(), str.end(),
*(x3::char_ - ... - x3::as_parser(delims)) //
% (x3::as_parser(delims) | ...) //
> x3::eoi,
vec);
return vec;
}
Now you can do funky stuff, like: Live On Coliru
static constexpr auto input = "foo (false) bar ( true ) qux (4.8e-9) <!-- any comment --> quz";
fmt::print("input: '{}'\n", input);
auto test = [](auto name, auto... p) {
fmt::print("{:>5}: {}\n", name, parse_cmd(input, p...));
};
constexpr auto d = "(" >> x3::double_ >> ")";
constexpr auto b = x3::skip(x3::blank)["(" >> x3::bool_ >> ")"];
constexpr auto x = "<!--" >> *(x3::char_ - "-->") >> "-->";
test("d", d);
test("b", b);
test("x", x);
test("x|b|d", x, b, d);
Printing
input: 'foo (false) bar ( true ) qux (4.8e-9) <!-- any comment --> quz'
d: ["foo (false) bar ( true ) qux ", " <!-- any comment --> quz"]
b: ["foo", " bar", " qux (4.8e-9) <!-- any comment --> quz"]
x: ["foo (false) bar ( true ) qux (4.8e-9) ", " quz"]
x|b|d: ["foo", " bar", " qux ", " ", " quz"]
Combining parsers in X3 is a joy, and crazy powerful. It will typically still be faster to compile than the Qi parsers.
Note that at no point in this answer did I question why you are reinventing tokenization using a (checks notes) parser generator. Perhaps you should tell me what you're actually building or parsing, and I could give you some real advice on how to use Spirit for great success :)
¹ which is c++14 only and will become c++17 only in the future