Search code examples
c++boostboost-spiritboost-spirit-qi

How to create an optional parser that is able to conditionally drop synthesized items


I am trying to create an optional parser rule. Depending on the value of the first attribute, I want to optionally emits a data.

Example, for the input:

x,2,3
y,3,4
x,5,6

If the first character is a y then the line should be discarded. Otherwise it will be processed. In this example, if the 3rd attribute is >= 4 then it is true. The synthesized attribute should be std::pair<bool, unsigned int> where the unsigned int value is the second attribute. The parser is:

using namespace qi = boost::spirit::qi;
using Data = std::pair<bool, unsigned>;
BOOST_PHOENIX_ADAPT_FUNCTION(Data, make_pair, std::make_pair, 2);

class DataParser :
    public qi::grammar<
    std::string::iterator,
    boost::spirit::char_encoding::ascii,
    boost::spirit::ascii::space_type,
    std::vector<Data>()
    >
{
    qi::rule<iterator_type, encoding_type, bool()> type;
    qi::rule<iterator_type, encoding_type, bool()> side;
    // doesn't compile: qi::rule<iterator_type, encoding_type, boost::spirit::ascii::space_type, boost::optional<Data>()> line;
    qi::rule<iterator_type, encoding_type, boost::spirit::ascii::space_type, qi::locals<bool, unsigned, bool>, Data()> line;
    qi::rule<iterator_type, encoding_type, boost::spirit::ascii::space_type, sig_type> start;

public:
    DataParser()
        : base_type(start)
    {
        using namespace qi::labels;

        type = qi::char_[_val = _1 == 'x'];
        side = qi::int_[_val = _1 >= 4];
        line %= (qi::omit[type[_a = _1]] >> ',' >> qi::omit[qi::uint_[_b = _1]] >> ',' >> qi::omit[side[_c = _1]])[if_(_a)[_val = make_pair(_c, _b)]];
        // doesn't compile: line %= (qi::omit[type[_a = _1]] >> ',' >> qi::omit[qi::uint_[_b = _1]] >> ',' >> qi::omit[side[_c = _1]])[if_(_a)[_val = make_pair(_c, _b)].else_[_val = qi::unused]];
        // doesn't compile: line %= (type >> ',' >> qi::uint_ >> ',' >> side)[if_(_1)[_val = make_pair(_3, _2)]];
        // doesn't compile: line %= (type >> ',' >> qi::uint_ >> ',' >> side)[if_(_1)[_val = make_pair(_3, _2)].else_[_val = unused]];
        start = *line;
    }
};

I get: [[false, 2], [false, 0], [true, 5]] where I want to get: [[false, 2], [true, 5]] (the second entry should be discarded).

I tried with boost::optional<Data> for the data rule and also to assign unused to _val but nothing worked.

Edit after fixing the issue with the accepted answer

The new rules are now:

using Data = std::pair<bool, unsigned>;
BOOST_PHOENIX_ADAPT_FUNCTION(Data, make_pair, std::make_pair, 2);

class DataParser :
    public qi::grammar<
        std::string::iterator,
        boost::spirit::char_encoding::ascii,
        boost::spirit::ascii::blank_type,
        std::vector<Data>()
    >
{
    using Items = boost::fusion::vector<bool, unsigned, bool>;

    qi::rule<iterator_type, encoding_type, bool()> type;
    qi::rule<iterator_type, encoding_type, bool()> side;
    qi::rule<iterator_type, encoding_type, boost::spirit::ascii::blank_type, Items()> line;
    qi::rule<iterator_type, encoding_type, boost::spirit::ascii::blank_type, sig_type> start;

public:
    DataParser()
        : base_type(start)
    {
        using namespace qi::labels;
        namespace px = boost::phoenix;

        type = qi::char_[_val = _1 == 'x'];
        side = qi::int_[_val = _1 >= 4];
        line = type >> ',' >> qi::uint_ >> ',' >> side;
        start = line[if_(_1)[px::push_back(_val, make_pair(_3, _2))]] % qi::eol;
    }
};

The key points being to use the semantic action to decide if the synthesized attribute should be added by using all attributes of the previous rule, in this case line.


Solution

  • Okay. You use lots of power-tools. But remember, with great power comes....

    In particular, qi::locals, phoenix, semantic actions: they're all complicating life so only use them as a last resort (or when they're a natural fit, which is rarely¹).

    Think directly,

     start = *line;
    
     line = // ....
    

    When you say

    If the first character is a y then the line should be discarded. Otherwise it will be processed.

    You can express this directly:

     line = !qi::lit('y') >> // ...
    

    Alternatively, spell out what starters to accept:

     line = qi::omit[ qi::char_("xz") ] >> // ...
    

    Done.

    Straight Forward Mapping

    Here I'll cheat by re-ordering the pair<unsigned, bool> so it matches the input order. Now everything works out of the box without "any" magic:

    line   = !qi::lit('y') >> qi::omit[qi::alnum] >> ',' >> qi::int_ >> ',' >> side;
    ignore = +(qi::char_ - qi::eol);
    
    start = qi::skip(qi::blank) [ (line | ignore) % qi::eol ];
    

    However it WILL result in the spurious entries as you noticed: Live On Compiler Explorer

    Parsed: {(2, false), (0, false), (5, true)}
    

    Improving

    Now you could go hack around things by changing the eol to also eat subsequent lines that don't appear to contain valid data lines. However, it becomes unwieldy, and we still have the desire to flip the pair's members.

    So, here's where I think an actrion could be handy:

      public:
        DataParser() : DataParser::base_type(start) {
            using namespace qi::labels;
    
            start  = qi::skip(qi::blank) [
                  (qi::char_ >> ',' >> qi::uint_ >> ',' >> qi::int_) [
                      _pass = process(_val, _1, _2, _3) ]
                % qi::eol ];
        }
    
      private:
        struct process_f {
            template <typename... T>
            bool operator()(Datas& into, char id, unsigned type, int side) const {
                switch(id) {
                    case 'z': case 'x':
                        into.emplace_back(side >= 4, type);
                        break;
                    case 'y': // ignore
                        break;
                    case 'a':
                        return false; // fail the rule
                }
                return true;
            }
        };
    
        boost::phoenix::function<action_f> process;
    

    You can see, there's a nice separation of concerns now. You parse (char,int,int) and conditionally process it. That's what's keeping this relatively simple compared to your attempts.

    Live Demo

    Live On Compiler Explorer

    #include <boost/fusion/adapted.hpp>
    #include <boost/spirit/include/qi.hpp>
    #include <boost/spirit/include/phoenix.hpp>
    #include <fmt/ranges.h>
    namespace qi = boost::spirit::qi;
    
    using Data = std::pair<bool, unsigned>;
    using Datas = std::vector<Data>;
    
    template <typename It>
    class DataParser : public qi::grammar<It, Datas()> {
        using Skipper = qi::blank_type;
        qi::rule<It, Datas(), Skipper> line;
        qi::rule<It, Datas()> start;
    
      public:
        DataParser() : DataParser::base_type(start) {
            using namespace qi::labels;
    
            start  = qi::skip(qi::blank) [
                  (qi::char_ >> ',' >> qi::uint_ >> ',' >> qi::int_) [
                      _pass = process(_val, _1, _2, _3) ]
                % qi::eol ];
        }
    
      private:
        struct process_f {
            template <typename... T>
            bool operator()(Datas& into, char id, unsigned type, int side) const {
                switch(id) {
                    case 'z': case 'x':
                        into.emplace_back(side >= 4, type);
                        break;
                    case 'y': // ignore
                        break;
                    case 'a':
                        return false; // fail the rule
                }
                return true;
            }
        };
    
        boost::phoenix::function<process_f> process;
    };
    
    int main() {
        using It = std::string::const_iterator;
        DataParser<It> p;
    
        for (std::string const input : {
                "x,2,3\ny,3,4\nx,5,6", 
                })
        {
            auto f = begin(input), l = end(input);
            Datas d;
            auto ok = qi::parse(f, l, p, d);
    
            if (ok) {
                fmt::print("Parsed: {}\n", d);
            } else {
                fmt::print("Parsed failed\n", d);
            }
    
            if (f!=l) {
                fmt::print("Remaining unparsed: '{}'\n", std::string(f,l));
            }
        }
    }
    

    Prints

    Parsed: {(false, 2), (true, 5)}
    

    ¹ Boost Spirit: "Semantic actions are evil"?