Search code examples
c++parsingboostboost-spiritboost-spirit-qi

How to insert a character at the beginning of a string attribute in spirit qi


I have the following rule:

rule<std::string::const_iterator, std::string()> t_ffind, t_sim, t_hash, t_state;

t_ffind = hold[(attr('$') >> t_sim >> t_hash >> t_state)] | t_sim;

which means that I could find t_sim alone or followed by t_hash and t_state, if it is alone t_ffind will take the exact value of t_sim, in the other case I will also insert a marking character at the beginning of the string.

but if I write the rule like that I would be parsing t_sim twice, so I modified the rule to:

t_ffind = t_sim >> -(qi::hold[t_hash >> t_state]);

but remains the problem of inserting the character if (t_hash >> t_state) is present, I think the solution could be some semantic action at the end:

t_ffind = t_sim >> -(qi::hold[t_hash >> t_state])[];

but I can't find how to do that, also if there's other solution that doesn't involve semantic action would be even better.


Solution

  • I'd say the idea of "adding a magic character to some unrelated attribute" constitutes a questionable design choice. In general, I recommend to keep parsing and program logic separate. So I'd parse into

    namespace ast {
      struct t_ffind {
          std::string t_sim;
          boost::optional<std::string> t_hash, t_state; // or whatever the types are
      };
    }
    

    Or, if you really don't have a reason to model the hash/state tokens into separate fields, you could do

    namespace ast {
      struct t_ffind {
          std::string t_sim_hash_state;
          bool sim_only;
      };
    }
    

    but it would get more complicated to set sim_only from within a semantic action. This is getting close to the issue you are facing.

    Your Wish

    Just for fun, let's see what we could do. Firstly, optimizing the repeated parsing of t_sim smells like a premature optimization. But perhaps you could use a semantic action to alter _val:

    t_ffind %= t_sim >> -(as_string[t_hash >> t_state] [ insert(_val, begin(_val), '$') ]);
    

    Note the use of as_string[] to glue the attributes of t_hash and t_state together, so the automatic attribute propagation keeps working. I strongly suspect this to be an - obviously - bigger performance hit than potentially parsing t_sim twice.

    You can try to wrangle more control from Spirit:

    t_ffind = (t_sim >> -(as_string[t_hash >> t_state])) 
        [ if_(_2) [ _val = '$' + _1 + *_2 ].else_ [ _val = _1 ] ];
    

    Still using the as_string intermediate concatenation. You can forgo it:

    t_ffind = (t_sim >> -(t_hash >> t_state))
        [ if_(_2) 
            [ _val = '$' + _1 + at_c<0>(*_2) + at_c<1>(*_2) ]
          .else_ 
            [ _val = _1 ] 
        ];
    

    By now, we're getting ridiculously far adrift for very little gain (if any). I'd suggest either

    1. writing it the naive way:

      t_ffind = hold[(attr('$') >> t_sim >> t_hash >> t_state)] | t_sim;
      
    2. fixing your AST to mirror the thing you're parsing

    3. writing the parser manually

    Full Demo

    All the above variations:

    Live On Coliru

    #include <boost/spirit/include/qi.hpp>
    #include <boost/spirit/include/phoenix.hpp>
    #include <boost/spirit/include/phoenix_fusion.hpp>
    
    int main() {
        using namespace boost::spirit::qi;
    
        rule<std::string::const_iterator, std::string()> 
            t_sim   = "sim",
            t_hash  = +digit,
            t_state = raw[lit("on")|"off"],
            t_ffind;
    
        for (auto initialize_t_ffind : std::vector<std::function<void()> > {
         [&] { t_ffind = hold[(attr('$') >> t_sim >> t_hash >> t_state)] | t_sim; },
         [&] {
                 // this works:
                 using boost::phoenix::insert;
                 using boost::phoenix::begin;
                 t_ffind %= t_sim >> -(as_string[t_hash >> t_state] [ insert(_val, begin(_val), '$') ]);
             },
         [&] { 
                // this works too:
                using boost::phoenix::if_;
                t_ffind = (t_sim >> -(as_string[t_hash >> t_state])) 
                    [ if_(_2) 
                        [ _val = '$' + _1 + *_2 ]
                      .else_ 
                        [ _val = _1 ] 
                    ];
             },
         [&] {
                 // "total control":
                using boost::phoenix::if_;
                using boost::phoenix::at_c;
                t_ffind = (t_sim >> -(t_hash >> t_state))
                    [ if_(_2) 
                        [ _val = '$' + _1 + at_c<0>(*_2) + at_c<1>(*_2) ]
                      .else_ 
                        [ _val = _1 ] 
                    ];
            } })
    
         {
             initialize_t_ffind();
    
             for (std::string const s : { "sim78off", "sim" })
             {
                 auto f = s.begin(), l = s.end();
                 std::string result;
                 if (parse(f, l, t_ffind, result)) {
                     std::cout << "Parsed: '" << result << "'\n";
                 } else {
                     std::cout << "Parse failed\n";
                 }
    
                 if (f != l) {
                     std::cout << "Remaining input: '" << std::string(f,l) << "'\n";
                 }
             }
         }
    }
    

    Prints:

    Parsed: '$sim78off'
    Parsed: 'sim'
    Parsed: '$sim78off'
    Parsed: 'sim'
    Parsed: '$sim78off'
    Parsed: 'sim'
    Parsed: '$sim78off'
    Parsed: 'sim'