Search code examples

Parsing a number of named sets of other named sets

So I want to write a... well... not-so-simple parser with boost::spirit::qi. I know the bare basics of boost spirit, having gotten acquainted with it for the first time in the past couple of hours.

Basically I need to parse this:

# comment

# other comment

set "Myset A"
    figure "AF 1"
        i 0 0 0
        i 1 2 5
        i 1 1 1
        f 3.1 45.11 5.3
        i 3 1 5
        f 1.1 2.33 5.166

    figure "AF 2"
        i 25 5 1
        i 3 1 3

# comment

set "Myset B"
    figure "BF 1"
        f 23.1 4.3 5.11

set "Myset C"
    include "Myset A" # includes all figures from Myset A

    figure "CF"
        i 1 1 1
        f 3.11 5.33 3

Into this:

struct int_point { int x, y, z; };
struct float_point { float x, y, z; };

struct figure
    string name;
    vector<int_point> int_points;
    vector<float_point> float_points;

struct figure_set
    string name;
    vector<figure> figures

vector<figure_set> figure_sets; // fill with the data of the input

Now, obviously having somebody write it for me would be too much, but can you please provide some tips on what to read and how to structure the grammar and parsers for this task?

And also... it may be the case that boost::spirit is not the best library I could use for the task. If so, which one is?

EDIT: Here's where I've gotten so far. But I'm not yet sure how to go on:

I am able to read a single figure but, I don't yet have an idea how to parse a set of figures.


  • Here's my take on it

    I believe the rule that will have been the blocker for you would be

    figure  = eps >> "figure" 
        >> name         [ at_c<0>(_val) = _1 ] >> '{' >> 
                ipoints [ push_back(at_c<1>(_val), _1) ]
              | fpoints [ push_back(at_c<2>(_val), _1) ]
         ) >> '}';

    This is actually a symptom of the fact that you parse inter-mixed i and f lines into separate containers.

    See below for an alternative.

    Here's my full code: test.cpp

    //#define BOOST_SPIRIT_DEBUG // before including Spirit
    #include <boost/fusion/adapted.hpp>
    #include <boost/spirit/include/qi.hpp>
    #include <boost/spirit/include/karma.hpp>
    #include <boost/spirit/include/phoenix.hpp>
    #include <boost/spirit/include/phoenix_fusion.hpp>
    #include <fstream>
    namespace Format
        struct int_point   { int x, y, z;   }; 
        struct float_point { float x, y, z; }; 
        struct figure
            std::string              name;
            std::vector<int_point>   int_points;
            std::vector<float_point> float_points;
            friend std::ostream& operator<<(std::ostream& os, figure const& o);
        struct figure_set
            std::string           name;
            std::set<std::string> includes;
            std::vector<figure>   figures;
            friend std::ostream& operator<<(std::ostream& os, figure_set const& o);
        typedef std::vector<figure_set> file_data;
            (int, x)(int, y)(int, z))
            (float, x)(float, y)(float, z))
            (std::string, name)
            (std::vector<Format::int_point>, int_points)
            (std::vector<Format::float_point>, float_points))
            (std::string, name)
            (std::set<std::string>, includes)
            (std::vector<Format::figure>, figures))
    namespace Format
        std::ostream& operator<<(std::ostream& os, figure const& o)
            using namespace boost::spirit::karma;
            return os << format_delimited(
                    "\n    figure" << no_delimit [ '"' << string << '"' ] << "\n    {"
                    << *("\n       i" << int_ << int_ << int_)
                    << *("\n       f" << float_ << float_ << float_)
                    << "\n    }"
                    , ' ', o);
        std::ostream& operator<<(std::ostream& os, figure_set const& o)
            using namespace boost::spirit::karma;
            return os << format_delimited(
                    "\nset" << no_delimit [ '"' << string << '"' ] << "\n{"
                    << *("\n    include " << no_delimit [ '"' << string << '"' ])
                    << *stream
                    << "\n}"
                    , ' ', o);
    namespace /*anon*/
        namespace phx=boost::phoenix;
        namespace qi =boost::spirit::qi;
        template <typename Iterator> struct skipper
            : public qi::grammar<Iterator>
            skipper() : skipper::base_type(start, "skipper")
                using namespace qi;
                comment = '#' >> *(char_ - eol) >> (eol|eoi);
                start   = comment | qi::space;
            qi::rule<Iterator> start, comment;
        template <typename Iterator> struct parser
            : public qi::grammar<Iterator, Format::file_data(), skipper<Iterator> >
            parser() : parser::base_type(start, "parser")
                using namespace qi;
                using phx::push_back;
                using phx::at_c;
                name    = eps >> lexeme [ '"' >> *~char_('"') >> '"' ];
                include = eps >> "include" >> name;
                ipoints = eps >> "i"       >> int_         >> int_   >> int_;
                fpoints = eps >> "f"       >> float_       >> float_ >> float_;
                figure  = eps >> "figure" 
                    >> name         [ at_c<0>(_val) = _1 ] >> '{' >> 
                            ipoints [ push_back(at_c<1>(_val), _1) ]
                          | fpoints [ push_back(at_c<2>(_val), _1) ]
                     ) >> '}';
                set     = eps >> "set" >> name >> '{' >> *include >> *figure >> '}';
                start   = *set;
            qi::rule<Iterator, std::string()        , skipper<Iterator> > name, include;
            qi::rule<Iterator, Format::int_point()  , skipper<Iterator> > ipoints;
            qi::rule<Iterator, Format::float_point(), skipper<Iterator> > fpoints;
            qi::rule<Iterator, Format::figure()     , skipper<Iterator> > figure;
            qi::rule<Iterator, Format::figure_set() , skipper<Iterator> > set;
            qi::rule<Iterator, Format::file_data()  , skipper<Iterator> > start;
    namespace Parser {
        bool parsefile(const std::string& spec, Format::file_data& data)
            std::ifstream in(spec.c_str());
            std::string v;
            v.insert(v.end(), std::istreambuf_iterator<char>(in.rdbuf()), std::istreambuf_iterator<char>());
            if (!in) 
                return false;
            typedef char const * iterator_type;
            iterator_type first = &v[0];
            iterator_type last = first+v.size();
                parser<iterator_type>  p;
                skipper<iterator_type> s;
                bool r = qi::phrase_parse(first, last, p, s, data);
                r = r && (first == last);
                if (!r)
                    std::cerr << spec << ": parsing failed at: \"" << std::string(first, last) << "\"\n";
                return r;
            catch (const qi::expectation_failure<char const *>& e)
                std::cerr << "FIXME: expected " << e.what_ << ", got '" << std::string(e.first, e.last) << "'" << std::endl;
                return false;
    int main()
        Format::file_data data;
        bool ok = Parser::parsefile("input.txt", data);
        std::cerr << "Parse " << (ok?"success":"failed") << std::endl;
        std::cout << "# figure sets exported automatically by karma\n\n";
        for (auto& set : data)
            std::cout << set;

    It outputs the parsed data as a verification: output.txt

    Parse success
    # figure sets exported automatically by karma
    set "Myset A"
        figure "AF 1"
           i 0 0 0 
           i 1 2 5 
           i 1 1 1 
           i 3 1 5 
           f 3.1 45.11 5.3 
           f 1.1 2.33 5.166 
        figure "AF 2"
           i 25 5 1 
           i 3 1 3 
    set "Myset B"
        figure "BF 1"
           f 23.1 4.3 5.11 
    set "Myset C"
        include  "Myset A"
        figure "CF"
           i 1 1 1 
           f 3.11 5.33 3.0 

    You will note that

    • the order of the point lines are changed (all int_points precede all float_points)
    • also, non-significant digits are added, e.g. in the last line 3.0 instead of 3 to show that the type if float.
    • you had 'forgotten' (?) about the includes in your question


    Have something that keeps the actual point lines in original order:

    typedef boost::variant<int_point, float_point> if_point;
    struct figure
        std::string            name;
        std::vector<if_point>  if_points;

    Now the rules become simply:

    name    = eps >> lexeme [ '"' >> *~char_('"') >> '"' ];
    include = eps >> "include" >> name;
    ipoints = eps >> "i"       >> int_         >> int_   >> int_;
    fpoints = eps >> "f"       >> float_       >> float_ >> float_;
    figure  = eps >> "figure" >> name >> '{' >> *(ipoints | fpoints) >> '}';
    set     = eps >> "set"    >> name >> '{' >> *include >> *figure  >> '}';
    start   = *set;

    Note the elegance in

    figure  = eps >> "figure" >> name >> '{' >> *(ipoints | fpoints) >> '}';

    And the output stays in the exact order of the input: output.txt

    Once again, full demo code (on github only): test.cpp

    Bonus update

    Finally, I made my first proper Karma grammar to output the results:

    name    = no_delimit ['"' << string << '"'];
    include = "include" << name;
    ipoints = "\n        i" << int_   << int_   << int_;
    fpoints = "\n        f" << float_ << float_ << float_;
    figure  = "figure" << name << "\n    {" << *(ipoints | fpoints) << "\n    }";
    set     = "set"    << name << "\n{" 
                << *("\n   " << include)
                << *("\n   " << figure)  << "\n}";
    start   = "# figure sets exported automatically by karma\n\n" 
                << set % eol;

    That was actually considerably more comfortable than I had expected. See it in the lastest version of the fully updated gist: test.hpp