Boost Spirit Qi - list parsing with two component sequences

I am trying to write a parser for an arguments list would allow something along the lines of the following:

myFunc( arg0, arg1, namedArg0 = valueA, namedArg1 = valueB )

In the above example, I would like the first two arguments to resolve to entities of TypeA, which would then be contained by a std::vector< TypeA >. The second two arguments would resolve to TypeB, which would be contained by a std::vector< TypeB >. All TypeA arguments should come before all TypeB arguments. But I would like all to be parsed from a single comma-separated list. It should be possible to have only TypeA arguments, only TypeB arguments or a sequence of TypeA elements followed by a sequence of TypeB elements.

I'm having trouble defining the rules such that the comma separating the final TypeA argument from the first TypeB argument is not mistaken for the expectation of another TypeA argument.

My current implementation is below. Can anyone offer any suggestions as to how to approach this problem?

The key distinction here is that TypeA arguments should be a single symbol whereas TypeB arguments should take the form of: symbol = symbol.

The problem seems to be related to the fact that TypeA arguments are equivalent to the first portion of TypeB arguments, therefore making the end of the TypeA sequence unclear?

Thanks!

struct Params
{
    std::vector<TypeA> a_elements;
    std::vector<TypeB> b_elements;

    Params(const std::vector<TypeA>& a_vec, const std::vector<TypeB>& b_vec)
    : a_elements( a_vec ), b_elements( b_vec ) {}

    static Params create(const std::vector<TypeA>& a_vec, const std::vector<TypeB>& b_vec)
    {
        return Params( a_vec, b_vec );
    }
};

struct ParamsParser : qi::grammar<Iterator, Params(), Skipper>
{
    qi::rule<Iterator, Params(), Skipper>                           start_rule;
    qi::rule<Iterator, std::vector<TypeA>(), Skipper>               type_a_vec_opt_rule;
    qi::rule<Iterator, std::vector<TypeB>(), Skipper>               type_b_vec_opt_rule;
    qi::rule<Iterator, std::vector<TypeA>(), Skipper>               type_a_vec_rule;
    qi::rule<Iterator, std::vector<TypeB>(), Skipper>               type_b_vec_rule;
    qi::rule<Iterator, TypeA(), Skipper>                            type_a_rule;
    qi::rule<Iterator, TypeB(), Skipper>                            type_b_rule;    
    qi::rule<Iterator, std::string(), Skipper>                      symbol_rule;

    ParamsParser() : ParamsParser::base_type( start_rule, "params_parser" )
    {
        start_rule =
        // version 1:
          ( ( '(' >> type_a_vec_rule >> ',' >> type_b_vec_rule >> ')' )
           [ qi::_val = boost::phoenix::bind( Params::create, qi::_1, qi::_2 ) ] )
        // version 2:
        | ( ( '(' >> type_a_vec_opt_rule >> ')' )
           [ qi::_val = boost::phoenix::bind( Params::create, qi::_1, std::vector<TypeB>() ) ] )
        // version 3:
        | ( ( '(' >> type_b_vec_opt_rule >> ')' )
           [ qi::_val = boost::phoenix::bind( Params::create, std::vector<TypeA>(), qi::_1 ) ] )
        ;

        type_a_vec_opt_rule = -type_a_vec_rule;
        type_b_vec_opt_rule = -type_b_vec_rule;
        type_a_vec_rule     = ( type_a_rule % ',' );        
        type_b_vec_rule     = ( type_b_rule % ',' );
        type_a_rule         = ( symbol_rule );
        type_b_rule         = ( symbol_rule >> '=' >> symbol_rule );
        symbol_rule         = qi::char_( "a-zA-Z_" ) >> *qi::char_( "a-zA-Z_0-9" );
    }
};

Solution

Two problems. First you want to make sure that you don't match a positional argument where a named argument could be matched¹.

Second: you want them in separate collections.

In the above example, I would like the first two arguments to resolve to entities of TypeA, which would then be contained by a std::vector< TypeA >. The second two arguments would resolve to TypeB, which would be contained by a std::vector< TypeB >. All TypeA arguments should come before all TypeB arguments.

So, you'd naively write

argument_list = '(' >> -positional_args >> -named_args >> ')';

Where

positional_args = expression % ',';
named_args      = named_arg % ',';
named_arg       = identifier >> '=' > expression)

Of course you already observed this would go awry with the optional interpunction between positionals and named args. But first things first.

Let's prevent positionals from matching where named could match:

positional_args = (!named_arg >> expression) % ',';

This is rather blunt. Depending on your precise expression/identifier productions you could use a more efficient differentiator, but this is the simplest thing that works.

Now, to continue in the same spirit, the simplest thing that could work with regards to the ',' between positionals/named would be to ... simply check that IF there was a positional THEN there must follow either ) or , (which should then be consumed). Quickly:

argument_list = '(' >> positional_args >> -named_args >> ')';
positional_args = *(expression >> (&lit(')') | ','));

Note how positional_args now allows for empty match, so it's no longer optional in the argument_list rule

Upshot:

you have your grammar
it naturally parses into two subsequent containers like vector<TypeA>,vector<TypeB>

What more could you want?

Objection, Your Honour!!

I can almost sense the response: I want more elegance! After all, now positional_args is "encumbered" with knowledge about named args as well as expecting the end of a parameter list.

Yes. This is a valid concern. I concur for more involved grammars I'd much rather actually write

 argument_list = '(' >> -(argument % ',') >> ')';

This would then naturally parse to a container of TypeAOrB² and you'd do some semantic checks to ensure no positional arguments come after positional ones:

Live On Coliru³

arguments %= 
    eps [ found_named_arg = false ] // initialize the local (_a)
    >> '(' >> -(argument(found_named_arg) % ',') >> ')';

argument  %= (named_arg > eps [named_only=true])  
           | (eps(!named_only) >> positional);

I would argue this is about as clumsy as the above, so why complicate the AST if you can have the natural vectors you want anyways?

Have Your Cake And Eat It

Yes you can combine all that with lots of Phoenix wizardry OR use custom attribute propagation to sort the types of arguments into your AST "buckets". Several answers exist on SO showing how to do chicanery like that.

parsing into several vector members

However I think for the question as posed there is no good reason to introduce any of that.

¹ OT rant: why not use proper terms instead of smoke and mirrors like "typeA" and "typeB". If it's really a secret, don't post problems on SO. If it's not, don't hide context, because 99% of the time problems and solutions follow from context

² boost::variant<TypeA, TypeB>

³ full source of Live On Coliru for anti-bitrot purposes:

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
namespace qi = boost::spirit::qi;

struct Positional { };
struct Named      { };

using Arg  = boost::variant<Positional, Named>;
using Args = std::vector<Arg>;

template <typename Iterator>
struct ArgsParser : qi::grammar<Iterator, Args()>
{
    ArgsParser() : ArgsParser::base_type( start_rule, "params_parser" )
    {
        using namespace qi;
        start_rule = skip(space) [ arguments ];

        arguments %= 
            eps [ found_named_arg = false ] // initialize the local (_a)
            >> '(' >> -(argument(found_named_arg) % ',') >> ')';

        argument  %= (named_arg > eps [named_only=true])  
                   | (eps(!named_only) >> positional);

        named_arg  = "n" >> attr(Named{});
        positional = "p" >> attr(Positional{});
    }

  private:
    using Skipper = qi::space_type;
    qi::rule<Iterator, Args()> start_rule;

    qi::rule<Iterator, Args(),       Skipper, qi::locals<bool> > arguments;
    qi::rule<Iterator, Arg(bool&),   Skipper> argument;
    qi::rule<Iterator, Named(),      Skipper> named_arg;
    qi::rule<Iterator, Positional(), Skipper> positional;

    qi::_a_type  found_named_arg;
    qi::_r1_type named_only;
};

// for debug output
static inline std::ostream& operator<<(std::ostream& os, Named)      { return os << "named";      }
static inline std::ostream& operator<<(std::ostream& os, Positional) { return os << "positional"; }

int main() {
    using It = std::string::const_iterator;
    ArgsParser<It> const p;

    for (std::string const input : {
            "()",
            "(p)",
            "(p,p)",
            "(p,n)",
            "(n,n)",
            "(n)",
            // start the failing
            "(n,p)",
            "(p,p,n,p,n)",
            "(p,p,n,p)",
            })
    {
        std::cout << " ======== " << input << " ========\n";

        It f(input.begin()), l(input.end());
        Args parsed;
        if (parse(f,l,p,parsed)) {
            std::cout << "Parsed " << parsed.size() << " arguments in list: ";
            std::copy(parsed.begin(), parsed.end(), std::ostream_iterator<Arg>(std::cout, " "));
            std::cout << "\n";
        } else {
            std::cout << "Parse failed\n";
        }

        if (f!=l) {
            std::cout << "Remaining input unparsed: '" << std::string(f,l) << "'\n";
        }
    }
}

Prints

 ======== () ========
Parsed 0 arguments in list: 
 ======== (p) ========
Parsed 1 arguments in list: positional 
 ======== (p,p) ========
Parsed 2 arguments in list: positional positional 
 ======== (p,n) ========
Parsed 2 arguments in list: positional named 
 ======== (n,n) ========
Parsed 2 arguments in list: named named 
 ======== (n) ========
Parsed 1 arguments in list: named 
 ======== (n,p) ========
Parse failed
Remaining input unparsed: '(n,p)'
 ======== (p,p,n,p,n) ========
Parse failed
Remaining input unparsed: '(p,p,n,p,n)'
 ======== (p,p,n,p) ========
Parse failed
Remaining input unparsed: '(p,p,n,p)'