Search code examples
c++boostboost-spiritboost-spirit-qi

Boost spirit skipper issues


I have trouble with boost spirit skippers.

I need to parse a file like that :

ROW int
int [int, int]
int [int, int]
...

I am able to parse it without problem (thanks to stackoverflow ;) only if I add an '_' after the first int.

In fact, I think the skipper eat the end of line after the first int, so the first and second (on second line) look as only one int. I don't understand how to keep eol but eat spaces. I've found examples to use a custom parser like here and here.

I tried qi::blank, custom parser with one single rule lit(' ') No matter what skipper I use, space and eol are always eat.

My grammar is :

a line :

struct rowType
{
    unsigned int number;
    std::list<unsigned int> list;
};

the full problem stored in a structure :

struct problemType
{
    unsigned int ROW;
    std::vector<rowType> rows;
};

the row parser :

template<typename Iterator>
struct row_parser : qi::grammar<Iterator, rowType(), qi::space_type>
{
    row_parser() : row_parser::base_type(start)
    {

        list  = '[' >> -(qi::int_ % ',') >> ']';
        start = qi::int_ >> list;
    }

    qi::rule<Iterator, rowType(), qi::space_type> start;
    qi::rule<Iterator, std::list<unsigned int>(), qi::space_type> list;
};

and the problem parser :

template<typename Iterator>
struct problem_parser : qi::grammar<Iterator,problemType(),qi::space_type>
{

    problem_parser() : problem_parser::base_type(start)
    {
        using boost::phoenix::bind;
        using qi::lit;

        start = qi::int_ >> lit('_') >> +(row);

        //BOOST_SPIRIT_DEBUG_NODE(start);
    }

    qi::rule<Iterator, problemType(),qi::space_type> start;
    row_parser<Iterator> row;
};

And I use it like that:

main() {
static const problem_parser<spirit::multi_pass<base_iterator_type> > p;
...
spirit::qi::phrase_parse(first, last ,
            p,
            qi::space,
            pb);
}

Of course, the qi::space is my problem, and a way to solve my problem would be to don't use a skipper, but phrase_parse requires one, and then my parser requires one.

I'm stuck since some hours now... I think it's something obvious I have misunderstood.

Thanks for your help.


Solution

  • In general the following directives are helpful for inhibiting/switching skippers mid-grammar:

    • qi::lexeme [ p ]
      which inhibits a skipper, e.g. if you want to be sure you parse an identifier without internal skips) - see also no_skip for comparison

    • qi::raw [ p ]
      which parses like always, including skips, but returns the raw iterator range of the matched source sequence (including the skipped positions)

    • qi::no_skip [ p ]
      Inhibiting Skipping Without Pre-skip (I've created a minimal example to demonstrate the difference here: Boost Spirit lexeme vs no_skip)

    • qi::skip(s) [ p ]
      which replaces the skipper by another skipper s altogether (note that you need to use appropriately declared qi::rule<> instances inside such a skip[] clause)

    where p is any parser expression.

    Specific solution

    Your problem, as you already know, might be that qi::space eats all whitespace. I can't possibly know what is wrong in your grammar (since you don't show either the full grammar, or relevant input).

    Therefore, here's what I'd write. Note

    • the use of qi::eol to explicitely require linebreaks at specific locations
    • the use of qi::blank as a skipper (not including eol)
    • for brevity I combined the grammars

    Code:

    #define BOOST_SPIRIT_DEBUG
    #include <boost/fusion/adapted.hpp>
    #include <boost/spirit/include/qi.hpp>
    #include <boost/spirit/include/phoenix.hpp>
    
    namespace qi = boost::spirit::qi;
    namespace phx = boost::phoenix;
    
    struct rowType {
        unsigned int number;
        std::list<unsigned int> list;
    };
    
    struct problemType {
        unsigned int ROW;
        std::vector<rowType> rows;
    };
    
    BOOST_FUSION_ADAPT_STRUCT(rowType, (unsigned int, number)(std::list<unsigned int>, list))
    BOOST_FUSION_ADAPT_STRUCT(problemType, (unsigned int, ROW)(std::vector<rowType>, rows))
    
    template<typename Iterator>
    struct problem_parser : qi::grammar<Iterator,problemType(),qi::blank_type>
    {
        problem_parser() : problem_parser::base_type(problem)
        {
            using namespace qi;
            list    = '[' >> -(int_ % ',') >> ']';
            row     = int_ >> list >> eol;
            problem = "ROW" >> int_ >> eol >> +row;
    
            BOOST_SPIRIT_DEBUG_NODES((problem)(row)(list));
        }
    
        qi::rule<Iterator, problemType()            , qi::blank_type> problem;
        qi::rule<Iterator, rowType()                , qi::blank_type> row;
        qi::rule<Iterator, std::list<unsigned int>(), qi::blank_type> list;
    };
    
    int main()
    {
        const std::string input = 
            "ROW 1\n"
            "2 [3, 4]\n"
            "5 [6, 7]\n";
    
        auto f = begin(input), l = end(input);
    
        problem_parser<std::string::const_iterator> p;
        problemType data;
    
        bool ok = qi::phrase_parse(f, l, p, qi::blank, data);
    
        if (ok) std::cout << "success\n";
        else    std::cout << "failed\n";
    
        if (f!=l)
            std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
    }
    

    If you really didn't want to require line breaks:

    template<typename Iterator>
    struct problem_parser : qi::grammar<Iterator,problemType(),qi::space_type>
    {
        problem_parser() : problem_parser::base_type(problem)
        {
            using namespace qi;
            list    = '[' >> -(int_ % ',') >> ']';
            row     = int_ >> list;
            problem = "ROW" >> int_ >> +row;
    
            BOOST_SPIRIT_DEBUG_NODES((problem)(row)(list));
        }
    
        qi::rule<Iterator, problemType()            , qi::space_type> problem;
        qi::rule<Iterator, rowType()                , qi::space_type> row;
        qi::rule<Iterator, std::list<unsigned int>(), qi::space_type> list;
    };
    
    int main()
    {
        const std::string input = 
            "ROW 1 " // NOTE whitespace, obviously required!
            "2 [3, 4]"
            "5 [6, 7]";
    
        auto f = begin(input), l = end(input);
    
        problem_parser<std::string::const_iterator> p;
        problemType data;
    
        bool ok = qi::phrase_parse(f, l, p, qi::space, data);
    
        if (ok) std::cout << "success\n";
        else    std::cout << "failed\n";
    
        if (f!=l)
            std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
    }
    

    Update

    In response to the comment: here is a snippet that shows how to read the input from a file. This was tested and works fine for me:

    std::ifstream ifs("input.txt"/*, std::ios::binary*/);
    ifs.unsetf(std::ios::skipws);
    
    boost::spirit::istream_iterator f(ifs), l;
    
    problem_parser<boost::spirit::istream_iterator> p;