Search code examples
c++boost-spirit

Boost Spirit parsing XML like grammar


I've got the following piece of code I want to modify fix, but I am totally new to boost-spirit. I know RE's but not how to exactly do them in spirit.

This is the parser.

Parser() : Parser::base_type(root)
{
    braces  = lit('{') >> *(iso8859_1::space) >> lit('}');
    str     = lexeme[lit('"') >> raw[*(~iso8859_1::char_('"'))] >> lit('"')];
    tolleaf = raw[+(~iso8859_1::char_("\"{}= \t\r\n"))];
    leaf    = raw[+(iso8859_1::alnum | iso8859_1::char_("-._:"))];
    taglist = lit('{') >> omit[*(iso8859_1::space)] >> lexeme[( ( str | skip[tolleaf] ) % *(iso8859_1::space) )] >> omit[*(iso8859_1::space)] >> lit('}');
    object  = raw[lit('{') >> *(root) >> *(iso8859_1::space) >> lit('}')];
    objlist = raw[lit('{') >> *( *(iso8859_1::space) >> object[&pushObj] ) >> *(iso8859_1::space) >> lit('}')];
    assign  = raw[(*(iso8859_1::space) >> ( leaf[&setLHS] | str[&setLHS]) >> *(iso8859_1::space) >> lit('=')
        >> *(iso8859_1::space) 
        >> ( leaf[&setRHSleaf] | str[&setRHSleaf] | taglist[&setRHStaglist] | objlist[&setRHSobjlist] | object[&setRHSobject] ) 
        >> *(iso8859_1::space))];
    root    = +(assign | braces);

    str.name("str");
    leaf.name("leaf");
    taglist.name("taglist");
    object.name("object");
    objlist.name("objlist");
    assign.name("assign");
    braces.name("braces");
    root.name("root");

}

And this is the format I'm trying to parse:

employees=
{
{
    province_pop_id=
    {
    province_id=1
    index=0
    type=9
    }
    count=1750
}

{
    province_pop_id=
    {
    province_id=1
    index=1
    type=9
    }
    count=34
}
}

The problem is the double {{. If I just have a

blahblah=
{
    value=
    {
        2
    }
}

it works fine.

I know that this

 objlist = raw[lit('{') >> *( *(iso8859_1::space) >> object[&pushObj] ) >> *(iso8859_1::space) >> lit('}')];

has to be changed, but I'm not sure how.


Solution

  • So, just to show you what I meant, I've cleaned up the grammar.

    See it Live on Coliru

    Parser() : Parser::base_type(root)
    {
        using namespace qi::iso8859_1;
    
        braces  = 
            '{' >> qi::eps >> '}'
            ;
        str     = qi::lexeme [
                 '"'
              >> *~char_('"')
              >> '"'
            ]
            ;
        tolleaf = qi::lexeme [
                +(~char_("\"{}= \t\r\n"))
            ]
            ;
        leaf    = qi::lexeme [
                +(alnum | char_("-._:"))
            ]
            ;
        taglist = 
               '{'
            >> -str % tolleaf
            >> '}'
            ;
        object  = 
                 '{'
              >> *root
              >> '}'
            ;
        objlist = 
                 '{'
              >> *object
              >> '}'
            ;
        assign  = 
                 (leaf | str)
              >> '='
              >> (leaf | str | taglist | objlist | object) 
            ;
        root    = 
            +(assign | braces)
            ;
    
        BOOST_SPIRIT_DEBUG_NODES((root)(braces)(str)(tolleaf)(leaf)(taglist)(objlist)(object)(assign));
    }
    

    It contained quite a few surprising things

    • Redundant whitespace checking, while a skipper already does this
    • The presence of skip[] and lexeme[] clearly suggests that the rules have been declared using a Skipper (if not, all rules are implicitely "lexemes")
    • Formatting!

      • Of course, using namespaces helps.
      • qi::lit is only required if there is an ambiguity or overload resolution needs it
      • many redundant ()
      • everything on a single line makes for incomprehensible rules.
        The proposed layout also makes it easier to selectively debug compilation problems by just commenting some line(s)
    • BOOST_SPIRIT_DEBUG* macros for debugging. See the output below the fully working sample

    Note I've NOT looked at the actual grammar. It looks like this could be improved, too, but I don't have the time to try and understand the intended grammar. However, as you can see it parses the snippet you showed in the question:

    Full Code

    #define BOOST_SPIRIT_DEBUG
    #include <boost/spirit/include/qi.hpp>
    #include <boost/spirit/include/phoenix.hpp>
    
    namespace qi        = boost::spirit::qi;
    
    template <typename It, typename Skipper = qi::iso8859_1::space_type>
        struct Parser : qi::grammar<It, Skipper>
    {
        Parser() : Parser::base_type(root)
        {
            using namespace qi::iso8859_1;
    
            braces  = 
                '{' >> qi::eps >> '}'
                ;
            str     = qi::lexeme [
                     '"'
                  >> *~char_('"')
                  >> '"'
                ]
                ;
            tolleaf = qi::lexeme [
                    +(~char_("\"{}= \t\r\n"))
                ]
                ;
            leaf    = qi::lexeme [
                    +(alnum | char_("-._:"))
                ]
                ;
            taglist = 
                   '{'
                >> -str % tolleaf
                >> '}'
                ;
            object  = 
                     '{'
                  >> *root
                  >> '}'
                ;
            objlist = 
                     '{'
                  >> *object
                  >> '}'
                ;
            assign  = 
                     (leaf | str)
                  >> '='
                  >> (leaf | str | taglist | objlist | object) 
                ;
            root    = 
                +(assign | braces)
                ;
    
            BOOST_SPIRIT_DEBUG_NODES((root)(braces)(str)(tolleaf)(leaf)(taglist)(objlist)(object)(assign));
        }
    
      private:
        qi::rule<It, Skipper> root, braces, str, tolleaf, leaf, taglist, objlist, object, assign;
    };
    
    int main()
    {
        typedef boost::spirit::istream_iterator It;
        std::cin.unsetf(std::ios::skipws);
        It f(std::cin), l;
    
        namespace iso8859_1 = qi::iso8859_1;
        Parser<It, iso8859_1::space_type> p;
    
        try
        {
            bool ok = qi::phrase_parse(f,l,p,iso8859_1::space);
            if (ok)   std::cout << "parse success\n";
            else      std::cerr << "parse failed: '" << std::string(f,l) << "'\n";
    
            if (f!=l) std::cerr << "trailing unparsed: '" << std::string(f,l) << "'\n";
            return ok;
        } catch(const qi::expectation_failure<It>& e)
        {
            std::string frag(e.first, e.last);
            std::cerr << e.what() << "'" << frag << "'\n";
        }
    
        return false;
    }
    

    Output

    This is what BOOST_SPIRIT_DEBUG prints:

    <root>
      <try>employees=\n{\n{\n    p</try>
      <assign>
        <try>employees=\n{\n{\n    p</try>
        <leaf>
          <try>employees=\n{\n{\n    p</try>
          <success>=\n{\n{\n    province_p</success>
          <attributes>[]</attributes>
        </leaf>
        <leaf>
          <try>\n{\n{\n    province_po</try>
          <fail/>
        </leaf>
        <str>
          <try>{\n{\n    province_pop</try>
          <fail/>
        </str>
        <taglist>
          <try>{\n{\n    province_pop</try>
          <str>
            <try>\n{\n    province_pop_</try>
            <fail/>
          </str>
          <tolleaf>
            <try>{\n    province_pop_i</try>
            <fail/>
          </tolleaf>
          <fail/>
        </taglist>
        <objlist>
          <try>{\n{\n    province_pop</try>
          <object>
            <try>\n{\n    province_pop_</try>
            <root>
              <try>\n    province_pop_id</try>
              <assign>
                <try>\n    province_pop_id</try>
                <leaf>
                  <try>\n    province_pop_id</try>
                  <success>=\n    {\n    province</success>
                  <attributes>[]</attributes>
                </leaf>
                <leaf>
                  <try>\n    {\n    province_</try>
                  <fail/>
                </leaf>
                <str>
                  <try>{\n    province_id=1\n</try>
                  <fail/>
                </str>
                <taglist>
                  <try>{\n    province_id=1\n</try>
                  <str>
                    <try>\n    province_id=1\n </try>
                    <fail/>
                  </str>
                  <tolleaf>
                    <try>province_id=1\n    in</try>
                    <success>=1\n    index=0\n    t</success>
                    <attributes>[]</attributes>
                  </tolleaf>
                  <str>
                    <try>=1\n    index=0\n    t</try>
                    <fail/>
                  </str>
                  <tolleaf>
                    <try>=1\n    index=0\n    t</try>
                    <fail/>
                  </tolleaf>
                  <fail/>
                </taglist>
                <objlist>
                  <try>{\n    province_id=1\n</try>
                  <object>
                    <try>\n    province_id=1\n </try>
                    <fail/>
                  </object>
                  <fail/>
                </objlist>
                <object>
                  <try>{\n    province_id=1\n</try>
                  <root>
                    <try>\n    province_id=1\n </try>
                    <assign>
                      <try>\n    province_id=1\n </try>
                      <leaf>
                        <try>\n    province_id=1\n </try>
                        <success>=1\n    index=0\n    t</success>
                        <attributes>[]</attributes>
                      </leaf>
                      <leaf>
                        <try>1\n    index=0\n    ty</try>
                        <success>\n    index=0\n    typ</success>
                        <attributes>[]</attributes>
                      </leaf>
                      <success>\n    index=0\n    typ</success>
                      <attributes>[]</attributes>
                    </assign>
                    <assign>
                      <try>\n    index=0\n    typ</try>
                      <leaf>
                        <try>\n    index=0\n    typ</try>
                        <success>=0\n    type=9\n    }\n</success>
                        <attributes>[]</attributes>
                      </leaf>
                      <leaf>
                        <try>0\n    type=9\n    }\n </try>
                        <success>\n    type=9\n    }\n  </success>
                        <attributes>[]</attributes>
                      </leaf>
                      <success>\n    type=9\n    }\n  </success>
                      <attributes>[]</attributes>
                    </assign>
                    <assign>
                      <try>\n    type=9\n    }\n  </try>
                      <leaf>
                        <try>\n    type=9\n    }\n  </try>
                        <success>=9\n    }\n    count=1</success>
                        <attributes>[]</attributes>
                      </leaf>
                      <leaf>
                        <try>9\n    }\n    count=17</try>
                        <success>\n    }\n    count=175</success>
                        <attributes>[]</attributes>
                      </leaf>
                      <success>\n    }\n    count=175</success>
                      <attributes>[]</attributes>
                    </assign>
                    <assign>
                      <try>\n    }\n    count=175</try>
                      <leaf>
                        <try>\n    }\n    count=175</try>
                        <fail/>
                      </leaf>
                      <str>
                        <try>}\n    count=1750\n}\n\n</try>
                        <fail/>
                      </str>
                      <fail/>
                    </assign>
                    <braces>
                      <try>\n    }\n    count=175</try>
                      <fail/>
                    </braces>
                    <success>\n    }\n    count=175</success>
                    <attributes>[]</attributes>
                  </root>
                  <root>
                    <try>\n    }\n    count=175</try>
                    <assign>
                      <try>\n    }\n    count=175</try>
                      <leaf>
                        <try>\n    }\n    count=175</try>
                        <fail/>
                      </leaf>
                      <str>
                        <try>}\n    count=1750\n}\n\n</try>
                        <fail/>
                      </str>
                      <fail/>
                    </assign>
                    <braces>
                      <try>\n    }\n    count=175</try>
                      <fail/>
                    </braces>
                    <fail/>
                  </root>
                  <success>\n    count=1750\n}\n\n{</success>
                  <attributes>[]</attributes>
                </object>
                <success>\n    count=1750\n}\n\n{</success>
                <attributes>[]</attributes>
              </assign>
              <assign>
                <try>\n    count=1750\n}\n\n{</try>
                <leaf>
                  <try>\n    count=1750\n}\n\n{</try>
                  <success>=1750\n}\n\n{\n    provi</success>
                  <attributes>[]</attributes>
                </leaf>
                <leaf>
                  <try>1750\n}\n\n{\n    provin</try>
                  <success>\n}\n\n{\n    province_p</success>
                  <attributes>[]</attributes>
                </leaf>
                <success>\n}\n\n{\n    province_p</success>
                <attributes>[]</attributes>
              </assign>
              <assign>
                <try>\n}\n\n{\n    province_p</try>
                <leaf>
                  <try>\n}\n\n{\n    province_p</try>
                  <fail/>
                </leaf>
                <str>
                  <try>}\n\n{\n    province_po</try>
                  <fail/>
                </str>
                <fail/>
              </assign>
              <braces>
                <try>\n}\n\n{\n    province_p</try>
                <fail/>
              </braces>
              <success>\n}\n\n{\n    province_p</success>
              <attributes>[]</attributes>
            </root>
            <root>
              <try>\n}\n\n{\n    province_p</try>
              <assign>
                <try>\n}\n\n{\n    province_p</try>
                <leaf>
                  <try>\n}\n\n{\n    province_p</try>
                  <fail/>
                </leaf>
                <str>
                  <try>}\n\n{\n    province_po</try>
                  <fail/>
                </str>
                <fail/>
              </assign>
              <braces>
                <try>\n}\n\n{\n    province_p</try>
                <fail/>
              </braces>
              <fail/>
            </root>
            <success>\n\n{\n    province_pop</success>
            <attributes>[]</attributes>
          </object>
          <object>
            <try>\n\n{\n    province_pop</try>
            <root>
              <try>\n    province_pop_id</try>
              <assign>
                <try>\n    province_pop_id</try>
                <leaf>
                  <try>\n    province_pop_id</try>
                  <success>=\n    {\n    province</success>
                  <attributes>[]</attributes>
                </leaf>
                <leaf>
                  <try>\n    {\n    province_</try>
                  <fail/>
                </leaf>
                <str>
                  <try>{\n    province_id=1\n</try>
                  <fail/>
                </str>
                <taglist>
                  <try>{\n    province_id=1\n</try>
                  <str>
                    <try>\n    province_id=1\n </try>
                    <fail/>
                  </str>
                  <tolleaf>
                    <try>province_id=1\n    in</try>
                    <success>=1\n    index=1\n    t</success>
                    <attributes>[]</attributes>
                  </tolleaf>
                  <str>
                    <try>=1\n    index=1\n    t</try>
                    <fail/>
                  </str>
                  <tolleaf>
                    <try>=1\n    index=1\n    t</try>
                    <fail/>
                  </tolleaf>
                  <fail/>
                </taglist>
                <objlist>
                  <try>{\n    province_id=1\n</try>
                  <object>
                    <try>\n    province_id=1\n </try>
                    <fail/>
                  </object>
                  <fail/>
                </objlist>
                <object>
                  <try>{\n    province_id=1\n</try>
                  <root>
                    <try>\n    province_id=1\n </try>
                    <assign>
                      <try>\n    province_id=1\n </try>
                      <leaf>
                        <try>\n    province_id=1\n </try>
                        <success>=1\n    index=1\n    t</success>
                        <attributes>[]</attributes>
                      </leaf>
                      <leaf>
                        <try>1\n    index=1\n    ty</try>
                        <success>\n    index=1\n    typ</success>
                        <attributes>[]</attributes>
                      </leaf>
                      <success>\n    index=1\n    typ</success>
                      <attributes>[]</attributes>
                    </assign>
                    <assign>
                      <try>\n    index=1\n    typ</try>
                      <leaf>
                        <try>\n    index=1\n    typ</try>
                        <success>=1\n    type=9\n    }\n</success>
                        <attributes>[]</attributes>
                      </leaf>
                      <leaf>
                        <try>1\n    type=9\n    }\n </try>
                        <success>\n    type=9\n    }\n  </success>
                        <attributes>[]</attributes>
                      </leaf>
                      <success>\n    type=9\n    }\n  </success>
                      <attributes>[]</attributes>
                    </assign>
                    <assign>
                      <try>\n    type=9\n    }\n  </try>
                      <leaf>
                        <try>\n    type=9\n    }\n  </try>
                        <success>=9\n    }\n    count=3</success>
                        <attributes>[]</attributes>
                      </leaf>
                      <leaf>
                        <try>9\n    }\n    count=34</try>
                        <success>\n    }\n    count=34\n</success>
                        <attributes>[]</attributes>
                      </leaf>
                      <success>\n    }\n    count=34\n</success>
                      <attributes>[]</attributes>
                    </assign>
                    <assign>
                      <try>\n    }\n    count=34\n</try>
                      <leaf>
                        <try>\n    }\n    count=34\n</try>
                        <fail/>
                      </leaf>
                      <str>
                        <try>}\n    count=34\n}\n}\n</try>
                        <fail/>
                      </str>
                      <fail/>
                    </assign>
                    <braces>
                      <try>\n    }\n    count=34\n</try>
                      <fail/>
                    </braces>
                    <success>\n    }\n    count=34\n</success>
                    <attributes>[]</attributes>
                  </root>
                  <root>
                    <try>\n    }\n    count=34\n</try>
                    <assign>
                      <try>\n    }\n    count=34\n</try>
                      <leaf>
                        <try>\n    }\n    count=34\n</try>
                        <fail/>
                      </leaf>
                      <str>
                        <try>}\n    count=34\n}\n}\n</try>
                        <fail/>
                      </str>
                      <fail/>
                    </assign>
                    <braces>
                      <try>\n    }\n    count=34\n</try>
                      <fail/>
                    </braces>
                    <fail/>
                  </root>
                  <success>\n    count=34\n}\n}\n</success>
                  <attributes>[]</attributes>
                </object>
                <success>\n    count=34\n}\n}\n</success>
                <attributes>[]</attributes>
              </assign>
              <assign>
                <try>\n    count=34\n}\n}\n</try>
                <leaf>
                  <try>\n    count=34\n}\n}\n</try>
                  <success>=34\n}\n}\n</success>
                  <attributes>[]</attributes>
                </leaf>
                <leaf>
                  <try>34\n}\n}\n</try>
                  <success>\n}\n}\n</success>
                  <attributes>[]</attributes>
                </leaf>
                <success>\n}\n}\n</success>
                <attributes>[]</attributes>
              </assign>
              <assign>
                <try>\n}\n}\n</try>
                <leaf>
                  <try>\n}\n}\n</try>
                  <fail/>
                </leaf>
                <str>
                  <try>}\n}\n</try>
                  <fail/>
                </str>
                <fail/>
              </assign>
              <braces>
                <try>\n}\n}\n</try>
                <fail/>
              </braces>
              <success>\n}\n}\n</success>
              <attributes>[]</attributes>
            </root>
            <root>
              <try>\n}\n}\n</try>
              <assign>
                <try>\n}\n}\n</try>
                <leaf>
                  <try>\n}\n}\n</try>
                  <fail/>
                </leaf>
                <str>
                  <try>}\n}\n</try>
                  <fail/>
                </str>
                <fail/>
              </assign>
              <braces>
                <try>\n}\n}\n</try>
                <fail/>
              </braces>
              <fail/>
            </root>
            <success>\n}\n</success>
            <attributes>[]</attributes>
          </object>
          <object>
            <try>\n}\n</try>
            <fail/>
          </object>
          <success>\n</success>
          <attributes>[]</attributes>
        </objlist>
        <success>\n</success>
        <attributes>[]</attributes>
      </assign>
      <assign>
        <try>\n</try>
        <leaf>
          <try>\n</try>
          <fail/>
        </leaf>
        <str>
          <try></try>
          <fail/>
        </str>
        <fail/>
      </assign>
      <braces>
        <try>\n</try>
        <fail/>
      </braces>
      <success>\n</success>
      <attributes>[]</attributes>
    </root>
    parse success