Search code examples
c++parsingboost-spirit-qi

parsing a simple repeated text macro with Boost.spirit


I'm learning how to use Boost.Spirit library for parsing strings. It seems to be a very nice tool but difficult as well. So, I want to parse a string with some words separated with / and put them in a vector of strings. Here is an example:word1/word2/word3. That's a simple task, I can do this with the following finction:

bool r = phrase_parse(first, last, (+~char_("/") % qi::lit("/")),space,v)

where v is std::vector<std::string>. But in general, I'd like to parse something like w1/[w2/w3]2/w4 which is equivalent to w1/w2/w3/w2/w3/w4, that is [w2/w3]2 means that w2/w3 is repeated twice. Could anyone give me some ideas on that? I read the documentation but still have some problems.

Thank you in advance!


Solution

  • Fully working demo: live on Coliru

    What this adds over a naive approach is that raw values are optionally ended at ] if the state is in_group.

    I elected pass the state using an inherited attribute (bool).

    This implementation allows nested sub-groups as well, e.g.: "[w1/[w2/w3]2/w4]3"

    #define BOOST_SPIRIT_DEBUG
    #include <boost/spirit/include/qi.hpp>
    #include <boost/spirit/include/phoenix.hpp>
    
    namespace phx = boost::phoenix;
    
    int main()
    {
        typedef std::string::const_iterator It;
        const std::string input = "[w1/[w2/w3]2/w4]3";
    
        std::vector<std::string> v;
        It first(input.begin()), last(input.end());
    
        using namespace boost::spirit::qi;
    
        rule<It, std::string(bool in_group)> raw;
        rule<It, std::vector<std::string>(bool in_group), space_type> 
            group, 
            delimited;
    
        _r1_type in_group; // friendly alias for the inherited attribute
    
        raw       = eps(in_group) >> +~char_("/]") 
                  | +~char_("/");
    
        delimited = (group(in_group)|raw(in_group)) % '/';
    
        group     = ('[' >> delimited(in_group=true) >> ']' >> int_) 
            [ phx::while_(_2--) 
                [ phx::insert(_val, phx::end(_val), phx::begin(_1), phx::end(_1)) ]
            ];
    
        BOOST_SPIRIT_DEBUG_NODES((raw)(delimited)(group));
    
        bool r = phrase_parse(first, last, 
                delimited(false),
                space,v);
    
        if (r)
            std::copy(v.begin(), v.end(), std::ostream_iterator<std::string>(std::cout, "\n"));
    }
    

    Prints:

    w1
    w2
    w3
    w2
    w3
    w4
    w1
    w2
    w3
    w2
    w3
    w4
    w1
    w2
    w3
    w2
    w3
    w4
    

    (besides debug info)