I'm learning how to use Boost.Spirit library for parsing strings. It seems to be a very nice tool but difficult as well. So, I want to parse a string with some words separated with /
and put them in a vector of strings. Here is an example:word1/word2/word3
. That's a simple task, I can do this with the following finction:
bool r = phrase_parse(first, last, (+~char_("/") % qi::lit("/")),space,v)
where v
is std::vector<std::string>
. But in general, I'd like to parse something like w1/[w2/w3]2/w4
which is equivalent to w1/w2/w3/w2/w3/w4
, that is [w2/w3]2
means that w2/w3
is repeated twice. Could anyone give me some ideas on that? I read the documentation but still have some problems.
Thank you in advance!
Fully working demo: live on Coliru
What this adds over a naive approach is that raw
values are optionally ended at ]
if the state is in_group
.
I elected pass the state using an inherited attribute (bool
).
This implementation allows nested sub-groups as well, e.g.: "[w1/[w2/w3]2/w4]3"
#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
namespace phx = boost::phoenix;
int main()
{
typedef std::string::const_iterator It;
const std::string input = "[w1/[w2/w3]2/w4]3";
std::vector<std::string> v;
It first(input.begin()), last(input.end());
using namespace boost::spirit::qi;
rule<It, std::string(bool in_group)> raw;
rule<It, std::vector<std::string>(bool in_group), space_type>
group,
delimited;
_r1_type in_group; // friendly alias for the inherited attribute
raw = eps(in_group) >> +~char_("/]")
| +~char_("/");
delimited = (group(in_group)|raw(in_group)) % '/';
group = ('[' >> delimited(in_group=true) >> ']' >> int_)
[ phx::while_(_2--)
[ phx::insert(_val, phx::end(_val), phx::begin(_1), phx::end(_1)) ]
];
BOOST_SPIRIT_DEBUG_NODES((raw)(delimited)(group));
bool r = phrase_parse(first, last,
delimited(false),
space,v);
if (r)
std::copy(v.begin(), v.end(), std::ostream_iterator<std::string>(std::cout, "\n"));
}
Prints:
w1
w2
w3
w2
w3
w4
w1
w2
w3
w2
w3
w4
w1
w2
w3
w2
w3
w4
(besides debug info)