Search code examples
c++parsingboostc++14boost-spirit-x3

Parsing comma-separated list of ranges and numbers with semantic actions


Using Boost.Spirit X3, I want to parse a comma-separated list of ranges and individual numbers (e.g. 1-4, 6, 7, 9-12) into a single std::vector<int>. Here's what I've come up with:

namespace ast {
    struct range 
    {
        int first_, last_;    
    };    

    using expr = std::vector<int>;    
}

namespace parser {        
    template<typename T>
    auto as_rule = [](auto p) { return x3::rule<struct _, T>{} = x3::as_parser(p); };

    auto const push = [](auto& ctx) { 
        x3::_val(ctx).push_back(x3::_attr(ctx)); 
    };  

    auto const expand = [](auto& ctx) { 
        for (auto i = x3::_attr(ctx).first_; i <= x3::_attr(ctx).last_; ++i) 
            x3::_val(ctx).push_back(i);  
    }; 

    auto const number = x3::uint_;
    auto const range  = as_rule<ast::range> (number >> '-' >> number                   ); 
    auto const expr   = as_rule<ast::expr>  ( -(range [expand] | number [push] ) % ',' );
} 

Given the input

    "1,2,3,4,6,7,9,10,11,12",   // individually enumerated
    "1-4,6-7,9-12",             // short-hand: using three ranges

this is successfully parsed as ( Live On Coliru ):

OK! Parsed: 1, 2, 3, 4, 6, 7, 9, 10, 11, 12, 
OK! Parsed: 1, 2, 3, 4, 6, 7, 9, 10, 11, 12, 

Question: I think I understand that applying the semantic action expand to the range part is necessary, but why do I also have to apply the semantic action push to the number part? Without it (i.e. with a plain ( -(range [expand] | number) % ',') rule for expr, the individual numbers don't get propagated into the AST ( Live On Coliru ):

OK! Parsed: 
OK! Parsed: 1, 2, 3, 4, 6, 7, 9, 10, 11, 12, 

Bonus Question: do I even need semantic actions at all to do this? The Spirit X3 documentation seems to discourage them.


Solution

  • The FAQ of this that semantic actions suppress automatic attribute propagation. The assumption being that the semantic action will take care of it instead.

    In general there are two approaches:

    • either use operator%= instead of operator= to assign the definition to the rule

    • or use the third (optional) template argument to the rule<> template, which can be specified as true to force automatic propagation semantics.


    Simplified sample

    Here, I simplify mostly by removing the semantic action inside the range rule itself. Now, we can drop the ast::range type altogether. No more fusion adaptation.

    Instead we use the "naturally" synthesized attribute of numer>>'-'>>number which is a fusion sequence of ints (fusion::deque<int, int> in this case).

    Now, all that's left to make it work, is to make sure the branches of | yield compatible types. A simple repeat(1)[] fixes that.

    Live On Coliru

    #include <boost/spirit/home/x3.hpp>
    #include <iostream>
    
    namespace x3 = boost::spirit::x3;
    
    namespace ast {
        using expr = std::vector<int>;    
    
        struct printer {
            std::ostream& out;
    
            auto operator()(expr const& e) const {
                std::copy(std::begin(e), std::end(e), std::ostream_iterator<expr::value_type>(out, ", "));;
            }
        };    
    }
    
    namespace parser {        
        auto const expand = [](auto& ctx) { 
            using boost::fusion::at_c;
    
            for (auto i = at_c<0>(_attr(ctx)); i <= at_c<1>(_attr(ctx)); ++i) 
                x3::_val(ctx).push_back(i);  
        }; 
    
        auto const number = x3::uint_;
        auto const range  = x3::rule<struct _r, ast::expr> {} = (number >> '-' >> number) [expand]; 
        auto const expr   = x3::rule<struct _e, ast::expr> {} = -(range | x3::repeat(1)[number]  ) % ',';
    } 
    
    template<class Phrase, class Grammar, class Skipper, class AST, class Printer>
    auto test(Phrase const& phrase, Grammar const& grammar, Skipper const& skipper, AST& data, Printer const& print)
    {
        auto first = phrase.begin();
        auto last = phrase.end();
        auto& out = print.out;
    
        auto const ok = phrase_parse(first, last, grammar, skipper, data);
        if (ok) {
            out << "OK! Parsed: "; print(data); out << "\n";
        } else {
            out << "Parse failed:\n";
            out << "\t on input: " << phrase << "\n";
        }
        if (first != last)
            out << "\t Remaining unparsed: '" << std::string(first, last) << '\n';    
    }
    
    int main() {
        std::string numeric_tests[] =
        {
            "1,2,3,4,6,7,9,10,11,12",   // individually enumerated
            "1-4,6-7,9-12",             // short-hand: using three ranges
        };
    
        for (auto const& t : numeric_tests) {
            ast::expr numeric_data;
            test(t, parser::expr, x3::space, numeric_data, ast::printer{std::cout});
        }
    }
    

    Prints:

    OK! Parsed: 1, 2, 3, 4, 6, 7, 9, 10, 11, 12, 
    OK! Parsed: 1, 2, 3, 4, 6, 7, 9, 10, 11, 12,