Search code examples
c++boost-spiritboost-spirit-qihigher-order-functionsexpression-templates

Constructing a qi::rule with a function attribute


I'm trying to create a rule that returns a function<char(char const *)> constructed by currying a Phoenix expression. E.g.,

start = int_[_val = xxx];
rule<Iterator, function<char(char const *)> start;

What should xxx be so that parsing the string "5" should give me a function that gives me the fifth character of its input? I've tried things like lambda(_a = arg1)[arg1[_a]](_1) might work, but I've not been able to hit on the magic formula.

In other words, I'd like the attribute to curry arg2[arg1] on the value of the parsed int

Very grateful for any suggestions. Note that I'm on VC2008, so C++11 lambdas not available.

Mike


Solution

  • After fixing that rule declaration:

    typedef boost::function<char(char const*)> Func;
    qi::rule<Iterator, Func()> start;
    

    it worked: Live On Coliru (c++03).

    UPDATE:

    Why did I end up with such a complex contraption?

    qi::_val = px::bind(px::lambda[arg1[arg2]], px::lambda[arg1], qi::_1)
    

    Well. Let me tell you about the joy of complexing functional composition with lazy evaluation (in C++ template meta-programming that has these surprises with reference/value semantics): Don't do the following:

    qi::_val = px::lambda(_a = qi::_1) [arg1[_a]] // UB!!! DON'T DO THIS
    

    Depending on the compiler, optimization level, this might *appear to work. But it's invoking Undefined Behaviour [1]. The problem is that qi::_1 will be kept as a reference to the attribute exposed by qi::int_ parser expression. However, this reference, after the lifetime of the parser context has ended, is a dangling reference.

    So evaluating the functor indirects through an invalid reference. To avoid this you should say (Live On Coliru):

    qi::_val = px::lambda(_a = px::val(qi::_1)) [arg1[_a]]
    

    or even (if you like obscure code):

    qi::_val = px::lambda(_a = +qi::_1) [arg1[_a]]
    

    Or, you know, you can stick with the bound nested lambda, since the bind defaults to value-semantics for qi::_1 (unless you used the phx::cref/phx::ref wrappers).

    I hope the above analysis drives home the point I made in the comments earlier:

    Note that I wouldn't recommend this code style. Higher-order programming with Phoenix is tricky enough without composing them from within lazy actors in some embedded expression-template DSL: qi::_val = px::bind(px::lambda[arg1[arg2]], px::lambda[arg1], qi::_1). 'Nuff said?


    #define BOOST_SPIRIT_USE_PHOENIX_V3
    #include <boost/spirit/include/qi.hpp>
    #include <boost/spirit/include/phoenix.hpp>
    #include <boost/spirit/include/phoenix_operator.hpp>
    #include <boost/function.hpp>
    
    namespace qi = boost::spirit::qi;
    namespace px = boost::phoenix;
    
    typedef boost::function<char(char const*)> Func;
    
    int main()
    {
        typedef std::string::const_iterator Iterator;
        using namespace boost::phoenix::arg_names;
    
        qi::rule<Iterator, Func()> start;
    
        start = qi::int_ 
                [ qi::_val = px::bind(px::lambda[arg1[arg2]], px::lambda[arg1], qi::_1) ];
        // or:  [ qi::_val = px::lambda(_a = px::val(qi::_1))[arg1[_a]] ];
    
        static char const* inputs[] = { "0", "1", "2", "3", "4", 0 };
    
        for (char const* const* it = inputs; *it; ++it)
        {
            std::string const input(*it);
            Iterator f(input.begin()), l(input.end());
    
            Func function;
            bool ok = qi::parse(f, l, start, function);
    
            if (ok)
                std::cout << "Parse resulted in function() -> character " 
                   << function("Hello") << "; " 
                   << function("World") << "\n";
            else
                std::cout << "Parse failed\n";
    
            if (f != l)
                std::cout << "Remaining unparsed: '" << std::string(f, l) << "'\n";
        }
    }
    

    Prints

    Parse resulted in function() -> character H; W
    Parse resulted in function() -> character e; o
    Parse resulted in function() -> character l; r
    Parse resulted in function() -> character l; l
    Parse resulted in function() -> character o; d
    

    [1] (MSVC2013 appeared to crash, gcc may appear to work in -O3, but segfaults in -O0 etc.)