Search code examples
c++parsingboost-spiritboost-fusion

Parsing struct with enum fields and STL containers easily using Boost Spirit/Fusion


new to boost, I actually need boost spirit to write a simple parser to fill some data structure.

Here are roughly what they look like:

struct Task
{
    const string dataname;
    const Level level;
    const string aggregator;
    const set<string> groupby;
    void operator();
};


struct Schedule
{
    map<Level, ComputeTask> tasks;
    // I have left just to make it seems that 
    // the struct wrapping over the map is not
    // useless (this is not the full code)
    void operator()(const InstancePtr &node); 
};

Regarding Task, I don't know how I could use BOOST_FUSION_ADAPT_STRUCT, as mentioned in the employee example, or a variant, to make it work with enum and STL container fields.

Similar question for Schedule, but this time I am also using a user type (already registered to fusion maybe, is it recursive?).

I am designing the file format, the struct definitions and file formats may change so I prefer using boost instead of hand-crafted but hard to maintain code. I also do this for a learning purpose.

Here what the file could look like:

level: level operation name on(data1, data2, data3)
level: level operation name on()
level: level operation name on(data1, data2)

A line of is an entry of the map in Schedule, preceding the : is the key and then the rest of it defines the Task. Where level are replaced by some level keywords corresponding to the enum Level, similar case for operation, name is one of the allowed name (in a set of keywords), on() is a keyword and inside the parenthesis are zero or more strings provided by the user that should fill the set<string> groupby field in a Task.

I want it to be readable and I could even add english keywords which does not add anything else than readability, that is another reason to use some parsing library instead of handcrafted code.

Feel free to ask for more details if you think my question is not clear enough..

Thank you.


Solution

  • So, making some assumptions as your examples don't make the meaning very clear. But here goes:

    Going with a random enum:

    enum class Level { One, Two, Three, LEVEL };
    

    Sidenote: the std::set<> might need to be a sequential container, because usually groupby operations are not commutative (the order matters). I don't know about your domain, of course,

    Adapting:

    BOOST_FUSION_ADAPT_STRUCT(ComputeTask, level, aggregator, dataname, groupby)
    BOOST_FUSION_ADAPT_STRUCT(Schedule, tasks)
    

    Note that I subtly put the adapted fields in the grammar order. That helps a lot down the road.

    The simplest grammar that comes to mind:

    template <typename It>
    struct Parser : qi::grammar<It, Schedule()> {
        Parser() : Parser::base_type(_start) {
            using namespace qi;
    
            _any_word    = lexeme [ +char_("a-zA-Z0-9-_./") ];
            _operation   = _any_word; // TODO
            _group_field = _any_word; // TODO
            _dataname    = _any_word; // TODO
    
            _level       = no_case [ _level_sym ];
            _groupby     = '(' >> -(_group_field % ',') >> ')';
            _task        = _level >> _operation >> _dataname >> "on" >> _groupby;
            _entry       = _level >> ':' >> _task;
            _schedule    = _entry % eol;
            _start       = skip(blank) [ _schedule ];
    
            BOOST_SPIRIT_DEBUG_NODES((_start)(_schedule)(_task)(_groupby)(_level)(_operation)(_dataname)(_group_field))
        }
      private:
        struct level_sym : qi::symbols<char, Level> {
            level_sym() { this->add
                ("one", Level::One)
                ("two", Level::Two)
                ("three", Level::Three)
                ("level", Level::LEVEL);
            }
        } _level_sym;
    
        // lexemes
        qi::rule<It, std::string()> _any_word;
        qi::rule<It, std::string()> _operation, _dataname, _group_field; // TODO
        qi::rule<It, Level()> _level;
    
        using Skipper = qi::blank_type;
        using Table = decltype(Schedule::tasks);
        using Entry = std::pair<Level, ComputeTask>;
    
        qi::rule<It, std::set<std::string>(), Skipper> _groupby;
        qi::rule<It, ComputeTask(), Skipper> _task;
        qi::rule<It, Entry(), Skipper> _entry;
        qi::rule<It, Table(), Skipper> _schedule;
        qi::rule<It, Schedule()> _start;
    };
    

    I changed the input to have unique keys for Level in the schedule, otherwise only one entry would actually result.

    int main() {
        Parser<std::string::const_iterator> const parser;
    
        for (std::string const input : { R"(ONE: level operation name on(data1, data2, data3)
    TWO: level operation name on()
    THREE: level operation name on(data1, data2))" })
        {
            auto f = begin(input), l = end(input);
            Schedule s;
            if (parse(f, l, parser, s)) {
                std::cout << "Parsed\n";
                for (auto& [level, task] : s.tasks) {
                    std::cout << level << ": " << task << "\n";
                }
            } else {
                std::cout << "Failed\n";
            }
    
            if (f != l) {
                std::cout << "Remaining unparsed input: " << std::quoted(std::string(f,l)) << "\n";
            }
        }
    }
    

    Prints

    Parsed
    One: LEVEL operation name on (data1, data2, data3)
    Two: LEVEL operation name on ()
    Three: LEVEL operation name on (data1, data2)
    

    And, additonally with BOOST_SPIRIT_DEBUG defined:

    <_start>
      <try>ONE: level operation</try>
      <_schedule>
        <try>ONE: level operation</try>
        <_level>
          <try>ONE: level operation</try>
          <success>: level operation na</success>
          <attributes>[One]</attributes>
        </_level>
        <_task>
          <try> level operation nam</try>
          <_level>
            <try>level operation name</try>
            <success> operation name on(d</success>
            <attributes>[LEVEL]</attributes>
          </_level>
          <_operation>
            <try>operation name on(da</try>
            <success> name on(data1, data</success>
            <attributes>[[o, p, e, r, a, t, i, o, n]]</attributes>
          </_operation>
          <_dataname>
            <try>name on(data1, data2</try>
            <success> on(data1, data2, da</success>
            <attributes>[[n, a, m, e]]</attributes>
          </_dataname>
          <_groupby>
            <try>(data1, data2, data3</try>
            <_group_field>
              <try>data1, data2, data3)</try>
              <success>, data2, data3)\nTWO:</success>
              <attributes>[[d, a, t, a, 1]]</attributes>
            </_group_field>
            <_group_field>
              <try>data2, data3)\nTWO: l</try>
              <success>, data3)\nTWO: level </success>
              <attributes>[[d, a, t, a, 2]]</attributes>
            </_group_field>
            <_group_field>
              <try>data3)\nTWO: level op</try>
              <success>)\nTWO: level operati</success>
              <attributes>[[d, a, t, a, 3]]</attributes>
            </_group_field>
            <success>\nTWO: level operatio</success>
            <attributes>[[[d, a, t, a, 1], [d, a, t, a, 2], [d, a, t, a, 3]]]</attributes>
          </_groupby>
          <success>\nTWO: level operatio</success>
          <attributes>[[LEVEL, [o, p, e, r, a, t, i, o, n], [n, a, m, e], [[d, a, t, a, 1], [d, a, t, a, 2], [d, a, t, a, 3]]]]</attributes>
        </_task>
        <_level>
          <try>TWO: level operation</try>
          <success>: level operation na</success>
          <attributes>[Two]</attributes>
        </_level>
        <_task>
          <try> level operation nam</try>
          <_level>
            <try>level operation name</try>
            <success> operation name on()</success>
            <attributes>[LEVEL]</attributes>
          </_level>
          <_operation>
            <try>operation name on()\n</try>
            <success> name on()\nTHREE: le</success>
            <attributes>[[o, p, e, r, a, t, i, o, n]]</attributes>
          </_operation>
          <_dataname>
            <try>name on()\nTHREE: lev</try>
            <success> on()\nTHREE: level o</success>
            <attributes>[[n, a, m, e]]</attributes>
          </_dataname>
          <_groupby>
            <try>()\nTHREE: level oper</try>
            <_group_field>
              <try>)\nTHREE: level opera</try>
              <fail/>
            </_group_field>
            <success>\nTHREE: level operat</success>
            <attributes>[[]]</attributes>
          </_groupby>
          <success>\nTHREE: level operat</success>
          <attributes>[[LEVEL, [o, p, e, r, a, t, i, o, n], [n, a, m, e], []]]</attributes>
        </_task>
        <_level>
          <try>THREE: level operati</try>
          <success>: level operation na</success>
          <attributes>[Three]</attributes>
        </_level>
        <_task>
          <try> level operation nam</try>
          <_level>
            <try>level operation name</try>
            <success> operation name on(d</success>
            <attributes>[LEVEL]</attributes>
          </_level>
          <_operation>
            <try>operation name on(da</try>
            <success> name on(data1, data</success>
            <attributes>[[o, p, e, r, a, t, i, o, n]]</attributes>
          </_operation>
          <_dataname>
            <try>name on(data1, data2</try>
            <success> on(data1, data2)</success>
            <attributes>[[n, a, m, e]]</attributes>
          </_dataname>
          <_groupby>
            <try>(data1, data2)</try>
            <_group_field>
              <try>data1, data2)</try>
              <success>, data2)</success>
              <attributes>[[d, a, t, a, 1]]</attributes>
            </_group_field>
            <_group_field>
              <try>data2)</try>
              <success>)</success>
              <attributes>[[d, a, t, a, 2]]</attributes>
            </_group_field>
            <success></success>
            <attributes>[[[d, a, t, a, 1], [d, a, t, a, 2]]]</attributes>
          </_groupby>
          <success></success>
          <attributes>[[LEVEL, [o, p, e, r, a, t, i, o, n], [n, a, m, e], [[d, a, t, a, 1], [d, a, t, a, 2]]]]</attributes>
        </_task>
        <success></success>
        <attributes>[[[One, [LEVEL, [o, p, e, r, a, t, i, o, n], [n, a, m, e], [[d, a, t, a, 1], [d, a, t, a, 2], [d, a, t, a, 3]]]], [Two, [LEVEL, [o, p, e, r, a, t, i, o, n], [n, a, m, e], []]], [Three, [LEVEL, [o, p, e, r, a, t, i, o, n], [n, a, m, e], [[d, a, t, a, 1], [d, a, t, a, 2]]]]]]</attributes>
      </_schedule>
      <success></success>
      <attributes>[[[[One, [LEVEL, [o, p, e, r, a, t, i, o, n], [n, a, m, e], [[d, a, t, a, 1], [d, a, t, a, 2], [d, a, t, a, 3]]]], [Two, [LEVEL, [o, p, e, r, a, t, i, o, n], [n, a, m, e], []]], [Three, [LEVEL, [o, p, e, r, a, t, i, o, n], [n, a, m, e], [[d, a, t, a, 1], [d, a, t, a, 2]]]]]]]</attributes>
    </_start>
    

    Full Listing

    Live On Coliru

    //#define BOOST_SPIRIT_DEBUG
    #include <boost/spirit/include/qi.hpp>
    #include <boost/fusion/adapted.hpp>
    #include <vector>
    #include <map>
    #include <set>
    #include <iostream>
    #include <iomanip>
    #include <experimental/iterator>
    
    enum class Level { One, Two, Three, LEVEL };
    
    struct ComputeTask {
        std::string dataname;
        Level level;
        std::string aggregator;
        std::set<std::string> groupby;
    };
    
    struct Schedule {
        std::map<Level, ComputeTask> tasks;
    };
    
    //////////////////////
    // FOR DEBUG DEMO ONLY
    static inline std::ostream& operator<<(std::ostream& os, Level l) {
        switch(l) {
            case Level::One: return os << "One";
            case Level::Two: return os << "Two";
            case Level::Three: return os << "Three";
            case Level::LEVEL: return os << "LEVEL";
        }
        return os << "?";
    }
    
    static inline std::ostream& operator<<(std::ostream& os, ComputeTask const& task) {
        os << task.level << ' ' << task.aggregator << ' ' << task.dataname << " on (";
        copy(begin(task.groupby), end(task.groupby), std::experimental::make_ostream_joiner(os, ", "));
        return os << ')';
    }
    
    /////////////
    // FOR PARSER
    BOOST_FUSION_ADAPT_STRUCT(ComputeTask, level, aggregator, dataname, groupby)
    BOOST_FUSION_ADAPT_STRUCT(Schedule, tasks)
    
    namespace qi = boost::spirit::qi;
    
    template <typename It>
    struct Parser : qi::grammar<It, Schedule()> {
        Parser() : Parser::base_type(_start) {
            using namespace qi;
    
            _any_word    = lexeme [ +char_("a-zA-Z0-9-_./") ];
            _operation   = _any_word; // TODO
            _group_field = _any_word; // TODO
            _dataname    = _any_word; // TODO
    
            _level       = no_case [ _level_sym ];
            _groupby     = '(' >> -(_group_field % ',') >> ')';
            _task        = _level >> _operation >> _dataname >> "on" >> _groupby;
            _entry       = _level >> ':' >> _task;
            _schedule    = _entry % eol;
            _start       = skip(blank) [ _schedule ];
    
            BOOST_SPIRIT_DEBUG_NODES((_start)(_schedule)(_task)(_groupby)(_level)(_operation)(_dataname)(_group_field))
        }
      private:
        struct level_sym : qi::symbols<char, Level> {
            level_sym() { this->add
                ("one", Level::One)
                ("two", Level::Two)
                ("three", Level::Three)
                ("level", Level::LEVEL);
            }
        } _level_sym;
    
        // lexemes
        qi::rule<It, std::string()> _any_word;
        qi::rule<It, std::string()> _operation, _dataname, _group_field; // TODO
        qi::rule<It, Level()> _level;
    
        using Skipper = qi::blank_type;
        using Table = decltype(Schedule::tasks);
        using Entry = std::pair<Level, ComputeTask>;
    
        qi::rule<It, std::set<std::string>(), Skipper> _groupby;
        qi::rule<It, ComputeTask(), Skipper> _task;
        qi::rule<It, Entry(), Skipper> _entry;
        qi::rule<It, Table(), Skipper> _schedule;
        qi::rule<It, Schedule()> _start;
    };
    
    int main() {
        Parser<std::string::const_iterator> const parser;
    
        for (std::string const input : { R"(ONE: level operation name on(data1, data2, data3)
    TWO: level operation name on()
    THREE: level operation name on(data1, data2))" })
        {
            auto f = begin(input), l = end(input);
            Schedule s;
            if (parse(f, l, parser, s)) {
                std::cout << "Parsed\n";
                for (auto& [level, task] : s.tasks) {
                    std::cout << level << ": " << task << "\n";
                }
            } else {
                std::cout << "Failed\n";
            }
    
            if (f != l) {
                std::cout << "Remaining unparsed input: " << std::quoted(std::string(f,l)) << "\n";
            }
        }
    }