c++parsing boost boost-spirit boost-spirit-qi

attaching semantic actions to parser with boost spirit

I am trying to understand what "attaching" a semantic action to a parser exactly means, and more precisely I would like to understand when, and for what duration the semantic action is bound to the parser.

For this, I modified slightly the employee.cpp example of the boost spirit library in the following manner:

1°/ Added a print() function whose output is only to trace when it is called:

void print(const struct employee & e) { std::cout << e.surname << "\n"}

2°/ At the end of the constructor of the class employee_parser, I bound the print() function to the start parser:

employee_parser() : employee_parser::base_type(start)
    {
        using qi::int_;
        using qi::lit;
        using qi::double_;
        using qi::lexeme;
        using ascii::char_;

        quoted_string %= lexeme['"' >> +(char_ - '"') >> '"'];

        start %=
            lit("employee")
            >> '{'
            >>  int_ >> ','
            >>  quoted_string >> ','
            >>  quoted_string >> ','
            >>  double_
            >>  '}'
            ;
        start[&print];
    }

Although it seams to me that I have attached the start parser with the semantic action print, as indicated in the documentation, the print() function is never called. It seams the semantic action needs to be attached within the right end side of a parser definition, as many time as the parser appears in that same definition. Can anybody elaborate a little bit more on this ?

Solution

In spirit, a parser is a function object, and for the most part, the operators which are overloaded in order to allow you to make new parsers, such as >> and so on, return different function objects, rather than modifying the original.

If you ever used java and encountered java's immutable strings, you can think of it as a bit like that.

When you have an expression like

rule1 = lit("employee");
rule2 = (rule1 >> lit(",") >> rule1) [ &print ];

what is happening is that a new parser object is produced and assigned to variable rule2, and that parser object has the semantic action attached.

In fact there is a new temporary parser object for each operator in the expression. The overhead is only once when the parser is constructed, it doesn't really matter at parse time.

When you have

start[&print];

this is like producing a temporary value that is immediately discarded. It does not have side effects for the value in the start variable. That's why print is never called.

If it didn't work this way, then it would be a lot more complicated to make grammars, potentially.

When you define a grammar as in spirit qi, usually the definition is basically done in the constructor of the grammar object. First the prototypes of the rules are given, specifying their types, skippers, etc. Then you construct the rules one by one. You have to make sure that you don't use a rule in the definition of another rule before it is initialized. But after it is initialized, it mostly won't change as far as the grammar in concerned. (You can modify things like debug info though.)

If all the rules could potentially change after being initialized, then they would all have to update eachother about the changes and that would be more complicated.

You might imagine that that is avoided by having the rules store references to eachother, rather than values. But that implies pointers and dynamic allocations afaik, and would be slower. Part of the point in spirit is that it is expression templates -- all those "pointer dereferences" are supposed to get resolved at compile time, as I understand.