Search code examples
c++boostboost-spiritboost-spirit-qi

qi % operator consumes (1) delimiter attributes and (2) accepts a trailing delimiter


Fiddling around to find out about some strange behaviours of my parsers I finally ended with the finding that qi % does not exactly behave as I expect.

First issue: In the verbous documentation a % b gets described as a shortcut for a >> *(b >> a). But it is actually not. This holds only, if you accept the b's to be discarded.

Say simple_id was any parser. Then actually

simple_id % lit(";") 

is the same as

simple_id % some_sophisticated_attribute_emitting_parser_expression

because the right hand side expression will be discarded in any case (i.e. does not contribute to any attributes). In detail: The first expression behaves exactly as (for example):

simple_id % string(";")

So string() is semantically equivalent to lit() if certain constraints hold, i.e. both live in the domain of being rh-operands of %. Here is my first question: Do you consider this to be a bug? Or is it a feature? I discussed that on the mailing list and got the answer that it's a feature, because this behaviour is documented (if you go into the very details of the doc). If you do so you find they are right.

I want to be a user of this library. I found that things go easy with qi on higher levels of grammar. But if you get down to bits and bytes and iterator positions, life gets hard. At a point I decided not to trust any longer and to track down into qi code.

It took me just a few minutes to track down my issue inside qi. Once having the responsible code (list.hpp) on the screen, it was obvious to me, that qi % has another issue. This here is the exact semantic of qi %

a % b <- a >> *(b >> a) >> -(b)

In words: It accepts a trailing b (and consumes it) even if it is not followed by an a. This is definitely not documented. Just for fun I looked into the X3 implementation of %. The bug has been migrated and occurs there as well.

Here are my questions: Is my analysis correct? If so, what parser library do you use? Can you recommend one? If I am wrong, where did I fail?

I post these questions because I am not the only one struggling. I hope the infos provided here are helpful.

Below is a self-contained working example demonstrating the issue(s) and the solution for both problems. If you run the example, have a look at the second test in particular. It shows the % consuming the trailing ; (what it should not, I think).

My env: MSVC 2015, Target: Win32 Console, Boost 1.6.1

///////////////////////////////////////////////////////////////////////////
// This is a self-contained demo which compiles with MSVC 2015 to Win32
// console. Therefore it should compile with any modern compiler. :)
//
//
// This demo implements a new qi operator != which does the same as %
// does but without eating up the delimiters (unless they are non-output
// i.e. lit).
//
// The implementation also shows how to fix a bug which makes the current
// qi % operator eat a trailing b. The current implementation accepts 
// a >> *(b >> a) >> -(b).
//
//
// I utilize the not_equal_to proto::tag for the alternative % operation
// See the simple rules to compare both operators.
///////////////////////////////////////////////////////////////////////////

//#define BOOST_SPIRIT_DEBUG
#include <io.h>
#include <map>
#include <boost/spirit/repository/include/qi_confix.hpp>
#include <boost/spirit/include/qi.hpp>

// Change the result type to test containers etc.
// You may need to provide an << ostream operator to have output work
using result_type = std::string;

using iterator_type = std::string::const_iterator;

namespace qi = boost::spirit::qi;
namespace mpl = boost::mpl;
namespace proto = boost::proto;

namespace maxence { namespace parser {
///////////////////////////////////////////////////////////////////////////////
//  The skipper grammar (just skip this section while reading ;)
///////////////////////////////////////////////////////////////////////////////
template <typename Iterator>
struct skipper : qi::grammar<Iterator>
{
    skipper() : skipper::base_type(start)
    {
        qi::char_type char_;
        using boost::spirit::eol;
        using boost::spirit::repository::confix;

        ascii::space_type space;

        start =
            space                               // tab/space/cr/lf
            | confix("/*", "*/")[*(char_ - "*/")] // C-style comments
            | confix("//", eol)[*(char_ - eol)] // C++-style comments
            ;
    }

    qi::rule<Iterator> start;
};
}}

namespace boost { namespace spirit {
        ///////////////////////////////////////////////////////////////////////////
        // Enablers
        ///////////////////////////////////////////////////////////////////////////
        template <>
        struct use_operator<qi::domain, proto::tag::not_equal_to> // enables p != d
            : mpl::true_ {};
}}
namespace ascii = boost::spirit::ascii;

namespace boost { namespace spirit { namespace qi
{
    template <typename Left, typename Right>
    struct list_ex : binary_parser<list_ex<Left, Right> >
    {
        typedef Left left_type;
        typedef Right right_type;

        template <typename Context, typename Iterator>
        struct attribute
        {
            // Build a std::vector from the LHS's attribute. Note
            // that build_std_vector may return unused_type if the
            // subject's attribute is an unused_type.
            typedef typename
                traits::build_std_vector<
                typename traits::
                attribute_of<Left, Context, Iterator>::type
                >::type
                type;
        };

        list_ex(Left const& left_, Right const& right_)
            : left(left_), right(right_) {}


/////////////////////////////////////////////////////////////////////////
// code from qi % operator
//
// Note: The original qi code accepts a >> *(b >> a) >> -(b)
//       That means a trailing delimiter gets consumed
//
//              template <typename F>
//              bool parse_container(F f) const
//              {
//                  // in order to succeed we need to match at least one element 
//                  if (f(left)) return false;
//                  typename F::iterator_type save = f.f.first;
//
//                  // The while clause below is wrong
//                  // To correct that (not eat trailing delimiters) it should read: 
//                  //  while (!(!right.parse(f.f.first, f.f.last, f.f.context, f.f.skipper, unused) && f(left)))
//                  
//                  while (right.parse(f.f.first, f.f.last, f.f.context, f.f.skipper, unused)   <--- issue!
//                      && !f(left))
//                  {
//                      save = f.f.first;
//                  }
// 
//                  f.f.first = save;
//              return true;
//
/////////////////////////////////////////////////////////////////////////


/////////////////////////////////////////////////////////////////////////
// replacement to allow operator not to "eat up" the "delimiter"
//
        template <typename F>
        bool parse_container(F f) const
        {
            // in order to succeed we need to match at least one element 
            if (f(left)) return false;

            while (!(f(right) && f(left)));

            return true;
        }
//
/////////////////////////////////////////////////////////////////////////

        template <typename Iterator, typename Context
            , typename Skipper, typename Attribute>
            bool parse(Iterator& first, Iterator const& last
                , Context& context, Skipper const& skipper
                , Attribute& attr_) const
        {
            typedef detail::fail_function<Iterator, Context, Skipper>
                fail_function;

            // ensure the attribute is actually a container type
            traits::make_container(attr_);

            Iterator iter = first;
            fail_function f(iter, last, context, skipper);
            if (!parse_container(detail::make_pass_container(f, attr_)))
                return false;

            first = f.first;
            return true;
        }

        template <typename Context>
        info what(Context& context) const
        {
            return info("list_ex",
                std::make_pair(left.what(context), right.what(context)));
        }

        Left left;
        Right right;
    };

    ///////////////////////////////////////////////////////////////////////////
    // Parser generators: make_xxx function (objects)
    ///////////////////////////////////////////////////////////////////////////
    template <typename Elements, typename Modifiers>
    struct make_composite<proto::tag::not_equal_to, Elements, Modifiers>
        : make_binary_composite<Elements, list_ex>
    {};
}}}

namespace boost {   namespace spirit {  namespace traits {
    ///////////////////////////////////////////////////////////////////////////
    template <typename Left, typename Right>
    struct has_semantic_action<qi::list_ex<Left, Right> >
        : binary_has_semantic_action<Left, Right> {};

    ///////////////////////////////////////////////////////////////////////////
    template <typename Left, typename Right, typename Attribute
        , typename Context, typename Iterator>
        struct handles_container<qi::list_ex<Left, Right>, Attribute, Context
        , Iterator>
        : mpl::true_ {};
}}}

using rule_type = qi::rule <iterator_type, result_type(), maxence::parser::skipper<iterator_type>>;

namespace maxence { namespace parser {

    template <typename Iterator>
    struct ident : qi::grammar < Iterator, result_type() , skipper<Iterator >>
    {
        ident();
        rule_type not_equal_to, modulus, not_used;
    };

    // we actually don't need the start rule (see below)
    template <typename Iterator>
    ident<Iterator>::ident() : ident::base_type(not_equal_to)
    {
        not_equal_to = (qi::alpha | '_') >> *(qi::alnum | '_') != qi::char_(";");
        modulus = (qi::alpha | '_') >> *(qi::alnum | '_') % qi::char_(";");
        modulus.name("qi modulus operator");

        BOOST_SPIRIT_DEBUG_NODES(
            (not_equal_to)
        )
    }
}}


int main()
{
    namespace parser = maxence::parser;

    using rule_map_type = std::map<std::string, rule_type&>;
    using rule_iterator_type = std::map<std::string, rule_type&>::const_iterator;
    using ss_map_type = std::map<std::string, std::string>;
    using ss_iterator_type = ss_map_type::const_iterator;


    parser::ident<iterator_type> ident;
    parser::skipper<iterator_type> skipper;

    ss_map_type parser_input =
    {
        { "; delimited list without trailing delimiter \n(expected result: success, EOI reached)", "willy; anton" },
        { "; delimited list with trailing delimiter \n(expected result: success, EOI not reached)", "willy; anton;" }
    };
    rule_map_type rules =
    {
        { "E1", ident.not_equal_to },
        { "E2", ident.modulus }
    };

    for (ss_iterator_type input = parser_input.begin(); input != parser_input.end(); input++) {
        for (rule_iterator_type example = rules.begin(); example != rules.end(); example++) {
            std::string to_parse = input->second;
            ::result_type result;
            std::string parser_name = (example->second).name();
            std::cout << "--------------------------------------------" << std::endl;
            std::cout << "Description: " << input->first << std::endl;
            std::cout << "Parser [" << parser_name << "] parsing [" << to_parse << "]" << std::endl;
            auto b(to_parse.begin()), e(to_parse.end());

            bool success = qi::phrase_parse(b, e, (example)->second, skipper, result);

            // --- test for parser success
            if (success) std::cout << "Parser succeeded. Result: " << result << std::endl;
            else std::cout << " Parser failed. " << std::endl;

            //--- test for EOI
            if (b == e) {
                std::cout << "EOI reached.";
            } else {
                std::cout << "Failure: EOI not reached. Remaining: [";
                while (b != e) std::cout << *b++; std::cout << "]";
            }
            std::cout << std::endl << "--------------------------------------------" << std::endl;
        }
    }
    return 0;
}

Extension: Because of the comments I extend my post:

My != operator is different from the % operator . The != operator would add all the 'delimiters' found to the result vector. (a != qi::char_(";,")). To introduce my proposal to % would discard useful functionality.

Maybe there is a justification to introduce an additional operator. I think I should use another operator for that, != hurts my eyes. Anyway, the != operator has nice applications also. For example:

settings_list = name != expression;

I thought it is wrong that % does not eat trailing 'delimiters'. My code example above seemed to demonstrate that. Anyway, I stripped the example down to focus on that issue only. Now I know that missing ; are sitting happily somewhere in the Carribean having a Caipirinha. Better than being eaten. :)

The example below eats the trailing 'delimiter', because it's not really trailing. The issue was my test string. The Kleene star has a zero match after the last ;. Therefore it gets eaten which is correct behaviour.

I learned much about qi during this 'trip'. More than from the docs. Most important lesson learned: Shape your test cases carefully. A did a quick copy&paste from some example without thought. That introduced the problems.

#include <iostream>
#include <map>
#include <boost/spirit/include/qi.hpp>


namespace qi = boost::spirit::qi;
using iterator_type = std::string::const_iterator;
using result_type = std::string;

template <typename Parser>
void parse(const std::string message, const std::string& input, const Parser& parser)
{
    iterator_type iter = input.begin(), end = input.end();

    std::vector<result_type> parsed_result;

    std::cout << "-------------------------\n";
    std::cout << message << "\n";
    std::cout << "Parsing: \"" << input << "\"\n";

    bool result = qi::phrase_parse(iter, end, parser, qi::space, parsed_result);
    if (result)
    {
        std::cout << "Parser succeeded.\n";
        std::cout << "Parsed " << parsed_result.size() << " elements:";
        for (const auto& str : parsed_result)
            std::cout << "[" << str << "]";
        std::cout << std::endl;
    }
    else
    {
        std::cout << "Something failed. Unparsed: \"" << std::string(iter, end) << "\"" << std::endl;
    }
    if (iter == end) {
        std::cout << "EOI reached." << std::endl;
    }
    else {
        std::cout << "EOI not reached. Unparsed: \"" << std::string(iter, end) << "\"" << std::endl;
    }
    std::cout << "-------------------------\n";

}

int main()
{

    auto r1 = (*(qi::alpha | '_')) % qi::char_(";");
    auto r2 = qi::as_string[*(qi::alpha | '_')] % qi::char_(";");

    parse("% eating the trailing delimiter 'delimiter'",  
        "willy; anton; 1234", r1);
    parse("% eating the trailing 'delimiter' (limited as_string edition)'",
        "willy; anton; 1234", r2);

    return 0;
}

Solution

  • Here are the answers to all of the questions.

    (1) My analysis was incorrect. The % operator does not eat trailing 'delimiters'. The real problem was the parsing rule beeing a Kleene star rule. This rule matched did not find an identifier after the last 'delimiter', but it matched zero. So it is perfectly ok that % consumes the 'delimiter'.

    (2) I am not looking for a qi alternative currently.

    (3) The current implementation of % does not 'discard' the b of a % b. If in deed you have

    simple_id % some_sophisticated_attribute_emitting_parser_expression
    

    then the sophisticated thingy (which may be dynamic (like char_("+-*/")) must match for % to continue. My proposed change to % would break this feature.

    To have %= (see below) operate like % you'd have to use (a %= qi::omit[b]). This mimics a % b almost completely. The difference remains that %= intentionally eats the 'trailing delimiter'. There is an example for that in the code below. Therefore %= can not be taken as a superset of %.

    If qi should be extended by an operator which provides the functionality I requested is a discussion I do not want to promote. Regarding parser functionality qi is easy extensible, so that you can produce additional parsers to your liking.

    That compilers are allergic to qi 2.x with auto is another topic. More complex. I never thought, that in particular I with my MSVC 2015 environment would ever be on the non-crashing side of life.

    Anyway, I owe you what for having me insisting so much so stupidly. The code below provides an implementation of a %= operator (modulus_assign) for qi. It is implemented as list2 living in the mxc::qitoo namespace. I marked the header start and end if somebody finds it valuable and wants to use it.

    The main function is a show case demonstrating the commons and differences between the two operators. And showing once more that Kleene star is wild creature.

    #include <iostream>
    #include <map>
    
    
    ///////////////////////////
    // start: header list2.hpp
    ///////////////////////////
    
    #pragma once
    
    #include <boost/spirit/include/qi.hpp>
    
    namespace boost {
        namespace spirit {
            ///////////////////////////////////////////////////////////////////////////
            // Enablers
            ///////////////////////////////////////////////////////////////////////////
            template <>
            struct use_operator<qi::domain, proto::tag::modulus_assign> // enables p %= d
                : mpl::true_ {};
        }
    }
    
    namespace mxc {
        namespace qitoo {
    
            namespace spirit = boost::spirit;
            namespace qi = spirit::qi;
    
            template <typename Left, typename Right>
            struct list2 : qi::binary_parser<list2<Left, Right> >
            {
                typedef Left left_type;
                typedef Right right_type;
    
                template <typename Context, typename Iterator>
                struct attribute
                {
                    // Build a std::vector from the LHS's and RHS's attribute. Note
                    // that build_std_vector may return unused_type if the
                    // subject's attribute is an unused_type.
                    typedef typename
                        spirit::traits::build_std_vector<
                        typename spirit::traits::attribute_of<Left, Context, Iterator>::type>::type type;
                };
    
                list2(Left const& left_, Right const& right_) : left(left_), right(right_) {}
    
                template <typename F>
                bool parse_container(F f) const
                {
                    typename F::iterator_type save = f.f.first;
    
                    // we need a first left match at least
                    if (f(left)) return false;
    
                    // if right does not match rewind iterator and fail
                    if (f(right)) {
                        f.f.first = save;
                        return false;
                    }
    
                    // easy going
                    while (!f(left) && !f(right))
                    {
                        save = f.f.first;
                    }
    
                    f.f.first = save;
                    return true;
                }
    
                template <typename Iterator, typename Context, typename Skipper, typename Attribute>
                bool parse(Iterator& first, Iterator const& last, Context& context, Skipper const& skipper, Attribute& attr_) const
                {
                    typedef qi::detail::fail_function<Iterator, Context, Skipper>
                        fail_function;
    
                    // ensure the attribute is actually a container type
                    spirit::traits::make_container(attr_);
    
                    Iterator iter = first;
                    fail_function f(iter, last, context, skipper);
                    if (!parse_container(qi::detail::make_pass_container(f, attr_)))
                        return false;
    
                    first = f.first;
                    return true;
                }
    
                template <typename Context>
                qi::info what(Context& context) const
                {
                    return qi::info("list2",
                        std::make_pair(left.what(context), right.what(context)));
                }
    
                Left left;
                Right right;
            };
        }
    }
    
    namespace boost {
        namespace spirit {
            namespace qi {
                ///////////////////////////////////////////////////////////////////////////
                // Parser generators: make_xxx function (objects)
                ///////////////////////////////////////////////////////////////////////////
                template <typename Elements, typename Modifiers>
                struct make_composite<proto::tag::modulus_assign, Elements, Modifiers>
                    : make_binary_composite<Elements, mxc::qitoo::list2>
                {};
            }
    
            namespace traits
            {
                ///////////////////////////////////////////////////////////////////////////
                template <typename Left, typename Right>
                struct has_semantic_action<mxc::qitoo::list2<Left, Right> >
                    : binary_has_semantic_action<Left, Right> {};
    
                ///////////////////////////////////////////////////////////////////////////
                template <typename Left, typename Right, typename Attribute
                    , typename Context, typename Iterator>
                    struct handles_container<mxc::qitoo::list2<Left, Right>, Attribute, Context
                    , Iterator>
                    : mpl::true_ {};
            }
        }
    }
    ///////////////////////////
    // end: header list2.hpp
    ///////////////////////////
    
    namespace qi = boost::spirit::qi;
    namespace qitoo = mxc::qitoo;
    
    using iterator_type = std::string::const_iterator;
    using result_type = std::string;
    
    template <typename Parser>
    void parse(const std::string message, const std::string& input, const std::string& rule, const Parser& parser)
    {
        iterator_type iter = input.begin(), end = input.end();
    
        std::vector<result_type> parsed_result;
    
        std::cout << "-------------------------\n";
        std::cout << message << "\n";
        std::cout << "Rule: " << rule << std::endl;
        std::cout << "Parsing: \"" << input << "\"\n";
    
        bool result = qi::phrase_parse(iter, end, parser, qi::space, parsed_result);
        if (result)
        {
            std::cout << "Parser succeeded.\n";
            std::cout << "Parsed " << parsed_result.size() << " elements:";
            for (const auto& str : parsed_result)
                std::cout << "[" << str << "]";
            std::cout << std::endl;
        }
        else
        {
            std::cout << "Parser failed" << std::endl;
        }
        if (iter == end) {
            std::cout << "EOI reached." << std::endl;
        }
        else {
            std::cout << "EOI not reached. Unparsed: \"" << std::string(iter, end) << "\"" << std::endl;
        }
        std::cout << "-------------------------\n";
    
    }
    
    int main()
    {
        parse("Modulus-Assign Operator (%), list with several different 'delimiters'  "
            , "willy; anton; frank, joel, 1234"
            , "(+(qi::alpha | qi::char_('_'))) % qi::char_(\";,\"))"
            , (+(qi::alpha | qi::char_('_'))) % qi::char_(";,"));
    
        parse("Modulus-Assign Operator (%=), list with several different 'delimiters'  "
            , "willy; anton; frank, joel, 1234"
            , "(+(qi::alpha | qi::char_('_'))) %= qi::char_(\";,\"))"
            , (+(qi::alpha | qi::char_('_'))) %= qi::char_(";,"));
    
        parse("Modulus-Assign Operator (%), list with several different 'delimiters'  "
            , "willy; anton; frank, joel, 1234"
            , "((qi::alpha | qi::char_('_')) >> *(qi::alnum | '_')) % qi::char_(\";,\"))"
            , ((qi::alpha | qi::char_('_')) >> *(qi::alnum | '_')) % qi::char_(";,"));
    
        parse("Modulus-Assign Operator (%=), list with several different 'delimiters'  "
            , "willy; anton; frank, joel, 1234"
            , "((qi::alpha | qi::char_('_')) >> *(qi::alnum | '_')) %= qi::char_(\";,\"))"
            , ((qi::alpha | qi::char_('_')) >> *(qi::alnum | '_')) %= qi::char_(";,"));
    
        std::cout << std::endl << "Note that %= exposes the trailing 'delimiter' and it has to to enable this usage:" << std::endl;
    
        parse("Modulus-Assign Operator (%=), list with several different 'delimiters'\n using omit to mimic %"
            , "willy; anton; frank, joel, 1234"
            , "+(qi::alpha | qi::char_('_')) %= qi::omit[qi::char_(\";,\"))]"
            , +(qi::alpha | qi::char_('_')) %= qi::omit[qi::char_(";,")]);
    
        parse("Modulus Operator (%), list of assignments (x = digits;)\nBe careful with the Kleene star, Eugene!"
            , "x = 5; y = 7; z = 10; = 7;"
            , "*(qi::alpha | qi::char_('_')) %= (qi::lit(\"=\") >> +qi::digit >> qi::lit(';')))"
            , *(qi::alpha | qi::char_('_')) %= (qi::lit("=") >> +qi::digit >> qi::lit(';')));
    
        parse("Modulus-Assign Operator (%=), list of assignments (*bio hazard edition*)\nBe careful with the Kleene star, Eugene!"
            , "x = 5; y = 7; z = 10; = 7;"
            , "*(qi::alpha | qi::char_('_')) %= (qi::lit(\"=\") >> +qi::digit >> qi::lit(';')))"
            , *(qi::alpha | qi::char_('_')) %= (qi::lit("=") >> +qi::digit >> qi::lit(';')));
    
        parse("Modulus-Assign Operator (%=), list of assignments (x = digits;)\nBe careful with the Kleene star, Eugene!"
            , "x = 5; y = 7; z = 10; = 7;"
            , "+(qi::alpha | qi::char_('_')) %= (qi::lit(\"=\") >> +qi::digit >> qi::lit(';')))"
            , +(qi::alpha | qi::char_('_')) %= (qi::lit("=") >> +qi::digit >> qi::lit(';')));
        return 0;
    }