Search code examples
c++parsingboost-spiritpuredata

Why does this boost::spirit::qi rule fail to parse a string?


I'm writing a parser for PureData patches using Boost spirit and C++.

I have the following simple test of parsing canvas records:

#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted/adt/adapt_adt.hpp>
#include <boost/fusion/include/adapt_adt.hpp>

struct PdCanvas {
    int topX;
    int topY;
    int wX;
    int wY;
    std::string name;
    int openOnLoad; 
};

BOOST_FUSION_ADAPT_STRUCT(
    PdCanvas,
    (int, topX)
    (int, topY)
    (int, wX) 
    (int, wY)
    (std::string, name)
    (int, openOnLoad));


template <typename Iterator>
struct PdCanvasGrammar : qi::grammar<Iterator, PdCanvas(), ascii::space_type> {
    PdCanvasGrammar() : PdCanvasGrammar::base_type(canvasRule){
        
        canvasRule = qi::lit("#N canvas") >> qi::int_ >> qi::int_ >> qi::int_ >> qi::int_ >> +(qi::char_ - qi::space) >> qi::int_ >> ";";        

    }
    qi::rule<Iterator, PdCanvas(), ascii::space_type> canvasRule; 
   
};



int main(int argc, char** argv)
{
    if(argc != 2)
    {
        std::cout << "Usage: "  <<argv[0] << " <PatchFile>" << std::endl;
        exit(1); 
    }

    std::ifstream inputFile(argv[1]); 
    std::string inputString(std::istreambuf_iterator<char>(inputFile), {}); 

    PdCanvas root;
    PdCanvasGrammar<std::string::iterator> parser;
    std::cout << "Loaded file:\n " << inputString << std::endl;

    bool success = qi::phrase_parse(inputString.begin(), inputString.end(), parser, boost::spirit::ascii::space, root); 
    std::cout << "Success: " << success << std::endl;

    
    return 0; 

}

As one can see, the format of a canvas record is

#N canvas <int> <int> <int> <int> <string> <int>;

And that's what the rule should expect, but when I try to parse the following:

#N canvas 0 0 400 300 moo 1;

qi::phrase_parse returns false, indicating an unsuccessful parse.

As an aside, there is another form of the canvas grammar in PD, specifically for the root, which is of the form:

#N canvas <int> <int> <int> <int> <int>;

Which I have successfully parsed using a different rule, so my assumption is the problem comes from attempting to parse the string in the middle of the integers.

So my question is thus: What is wrong with my qi::rule and how can I change it to properly parse?


Solution

  • Two things:

    Greedy Parsing

    Note that PEG grammars are "greedy left-to-right", so you will want to make sure that the int_ >> ";" is not parsed into the name:

    Live On Coliru

    #include <boost/fusion/adapted.hpp>
    #include <boost/spirit/include/qi.hpp>
    #include <iomanip>
    namespace qi = boost::spirit::qi;
    
    struct PdCanvas { int topX, topY, wX, wY, openOnLoad; std::string name; };
    BOOST_FUSION_ADAPT_STRUCT(PdCanvas, topX, topY, wX, wY, name, openOnLoad);
    
    template <typename Iterator> struct PdCanvasGrammar : qi::grammar<Iterator, PdCanvas(), qi::space_type> {
        PdCanvasGrammar() : PdCanvasGrammar::base_type(canvasRule) {
    
            canvasRule =                            //
                "#N canvas" >> qi::int_ >> qi::int_ //
                >> qi::int_ >> qi::int_             //
                >> *(qi::char_ - (qi::int_ >> ';')) //
                >> qi::int_ >> ';'                  //
                ;
        }
    
        qi::rule<Iterator, PdCanvas(), qi::space_type> canvasRule;
    };
    
    int main() {
        PdCanvasGrammar<std::string::const_iterator> const parser;
    
        for (std::string const input :
             {
                 "#N canvas 0 0 400 300 moo 1;",
                 "#N canvas -10 -10 390 290 42 answers LtUaE -9;",
                 R"(#N canvas -10 -10 390 290 To be, or not to be, that is the question:
    Whether 'tis nobler in the mind to suffer
    The slings and arrows of outrageous fortune,
    Or to take Arms against a Sea of troubles,
    
    -9;)",
             }) //
        {
            // std::cout << "Input:\n " << quoted(input) << std::endl;
    
            if (PdCanvas root; phrase_parse(input.begin(), input.end(), parser, qi::space, root))
                std::cout << "Success -> " << boost::fusion::as_vector(root) << "\n";
            else
                std::cout << "Failed\n";
        }
    }
    

    Prints:

    Success -> (0 0 400 300 moo 1)
    Success -> (-10 -10 390 290 42answersLtUaE -9)
    Success -> (-10 -10 390 290 Tobe,ornottobe,thatisthequestion:Whether'tisnoblerinthemindtosufferTheslingsandarrowsofoutrageousfortune,OrtotakeArmsagainstaSeaoftroubles, -9)
    

    Skipping Whitespace

    I chose some outrageous "names" on purpose:

    Your rule has a skipper: space_type. This - by definition - means that +(qi::char_ - qi::space) is equivalent to +qi::char_ because spaces aren't even seen by the expression.

    To alleviate the issue make sure that the space-sensitive expression does not execute under the skipper, see Boost spirit skipper issues.

    Using lexeme[] here is the quickest solution:

        canvasRule =                                        //
            "#N canvas" >> qi::int_ >> qi::int_             //
            >> qi::int_ >> qi::int_                         //
            >> qi::lexeme[*(qi::char_ - (qi::int_ >> ';'))] //
            >> qi::int_ >> ';'                              //
            ;
    

    Prints Live:

    Success -> (0 0 400 300 moo  1)
    Success -> (-10 -10 390 290 42 answers LtUaE  -9)
    Success -> (-10 -10 390 290 To be, or not to be, that is the question:
    Whether 'tis nobler in the mind to suffer
    The slings and arrows of outrageous fortune,
    Or to take Arms against a Sea of troubles,
    
     -9)
    

    To also disallow space in the name, use qi::graph instead of qi::char_:

    Prints Live:

    Success -> (0 0 400 300 moo 1)
    Failed
    Failed
    

    Bonus Tips

    To make things easier to maintain, debug (!!) and also express intent, I'd

    • restructure the grammar using rules - some of which can be implicit lexemes
    • also encapsulate the skipper (the caller should not be dictating that)
    • making sure the entire input is matched (qi::eoi)

    Live On Coliru

    // #define BOOST_SPIRIT_DEBUG
    #include <boost/fusion/adapted.hpp>
    #include <boost/spirit/include/qi.hpp>
    #include <iomanip>
    namespace qi = boost::spirit::qi;
    
    struct PdCanvas { int topX, topY, wX, wY, openOnLoad; std::string name; };
    BOOST_FUSION_ADAPT_STRUCT(PdCanvas, topX, topY, wX, wY, name, openOnLoad);
    
    template <typename Iterator> struct PdCanvasGrammar : qi::grammar<Iterator, PdCanvas()> {
        PdCanvasGrammar() : PdCanvasGrammar::base_type(start) {
            using namespace qi;
            start      = skip(space)[canvasRule >> eoi];
            name       = +graph;
            canvasRule = "#N canvas" >> int_ >> int_ >> int_ >> int_ >> name >> int_ >> ';';
    
            BOOST_SPIRIT_DEBUG_NODES((start)(canvasRule)(name))
        }
    
      private:
        qi::rule<Iterator, PdCanvas()>                 start;
        qi::rule<Iterator, PdCanvas(), qi::space_type> canvasRule;
        qi::rule<Iterator, std::string()> name;
    };
    
    int main() {
        PdCanvasGrammar<std::string::const_iterator> const parser;
    
        for (std::string const input :
             {
                 "#N canvas 0 0 400 300 foo 1;",
                 "#N canvas 0 0 400 300 bar 1;",
                 "#N canvas 0 0 400 300 qux1 1;",
                 "#N canvas 0 0 400 300 qux23 1;",
                 "#N canvas 0 0 400 300 qux23;funky 1;",
                 "#N canvas 0 0 400 300 trailing 1; junk",
             }) //
        {
            std::cout << "Input: " << quoted(input) << std::endl;
    
            if (PdCanvas root; parse(input.begin(), input.end(), parser, root))
                std::cout << "    Success -> " << boost::fusion::as_vector(root) << "\n";
            else
                std::cout << "    Failed\n";
        }
    }
    

    Prints

    Input: "#N canvas 0 0 400 300 foo 1;"
        Success -> (0 0 400 300 foo 1)
    Input: "#N canvas 0 0 400 300 bar 1;"
        Success -> (0 0 400 300 bar 1)
    Input: "#N canvas 0 0 400 300 qux1 1;"
        Success -> (0 0 400 300 qux1 1)
    Input: "#N canvas 0 0 400 300 qux23 1;"
        Success -> (0 0 400 300 qux23 1)
    Input: "#N canvas 0 0 400 300 qux23;funky 1;"
        Success -> (0 0 400 300 qux23;funky 1)
    Input: "#N canvas 0 0 400 300 trailing 1; junk"
        Failed
    

    Or, with debug enabled, e.g.

    Input: "#N canvas 0 0 400 300 foo 1;"
    <start>
      <try>#N canvas 0 0 400 30</try>
      <canvasRule>
        <try>#N canvas 0 0 400 30</try>
        <name>
          <try>foo 1;</try>
          <success> 1;</success>
          <attributes>[[f, o, o]]</attributes>
        </name>
        <success></success>
        <attributes>[[0, 0, 400, 300, [f, o, o], 1]]</attributes>
      </canvasRule>
      <success></success>
      <attributes>[[0, 0, 400, 300, [f, o, o], 1]]</attributes>
    </start>
        Success -> (0 0 400 300 foo 1)