Search code examples
c++boost-spiritboost-spirit-qi

Using boost::spirit defaulting a parsed value to an earlier value when parsing into a struct


I'm generally familiar with using qi::attr to implement a "default value" for a missing entry in parsed input. But I haven't seen how to do this when the default value needs to be pulled from an earlier parse.

I'm trying to parse into the following struct:

struct record_struct {

    std::string Name;
    uint8_t Distance;
    uint8_t TravelDistance;
    std::string Comment;
};

From a relatively simple "(text) (number) [(number)] [//comment]" format, where both the second number and the comment are optional. If the second number is not present, it's value should be set to the same as the first number.

What follows is a cut down example of working code that doesn't QUITE do what I want. This version just defaults to 0 rather than the correct value. If possible, I'd like to isolate the parsing of the two integers to a separate parser rule, without giving up using the fusion struct.

Things I've tried that haven't compiled:

  • Replacing qi::attr(0) with qi::attr(qi::_2)
  • Trying to modify after the fact on an attr match with a semantic action `qi::attr(0)[qi::_3 = qi::_2]

The full test code:

#include <string>
#include <cstdint>
#include <boost/spirit/include/qi.hpp>

struct record_struct {

    std::string Name;
    uint8_t Distance;
    uint8_t TravelDistance;
    std::string Comment;
};

BOOST_FUSION_ADAPT_STRUCT(
    record_struct,
    (std::string, Name)
    (uint8_t, Distance)
    (uint8_t, TravelDistance)
    (std::string, Comment)
)

std::ostream &operator<<(std::ostream &o, const record_struct &s) {
    o << s.Name << " (" << +s.Distance << ":" << +s.TravelDistance << ") " << s.Comment;
    return o;
}

bool test(std::string s) {
    std::string::const_iterator iter = s.begin();
    std::string::const_iterator end = s.end();
    record_struct result;
    namespace qi = boost::spirit::qi;
    bool parsed = boost::spirit::qi::parse(iter, end, (
                    +(qi::alnum | '_') >> qi::omit[+qi::space]
                    >> qi::uint_ >> ((qi::omit[+qi::space] >> qi::uint_) | qi::attr(0))
                    >> ((qi::omit[+qi::space] >> "//" >> +qi::char_) | qi::attr(""))
                ), result);
    if (parsed) std::cout << "Parsed: " << result << "\n";
    else std::cout << "Failed: " << std::string(iter, end) << "\n";
    return parsed;
}

int main(int argc, char **argv) {

    if (!test("Milan 20 22")) return 1;
    if (!test("Paris 8 9 // comment")) return 1;
    if (!test("London 5")) return 1;
    if (!test("Rome 1 //not a real comment")) return 1;
    return 0;
}

Output:

Parsed: Milan (20:22)
Parsed: Paris (8:9)  comment
Parsed: London (5:0)
Parsed: Rome (1:0) not a real comment

Output I want to see:

Parsed: Milan (20:22)
Parsed: Paris (8:9)  comment
Parsed: London (5:5)
Parsed: Rome (1:1) not a real comment

Solution

  • First of all, instead of spelling out omit[+space], just use a skipper:

    bool parsed = qi::phrase_parse(iter, end, (
                       qi::lexeme[+(alnum | '_')]
                    >> uint_ >> (uint_ | attr(0))
                    >> (("//" >> lexeme[+qi::char_]) | attr(""))
                ), qi::space, result);
    

    Here, qi::space is the skipper. lexeme[] avoids skipping inside the sub-expression (see Boost spirit skipper issues).

    Next up, you can do it more than one way.

    1. use a local attribute to temporarily store a value:

      Live On Coliru

      rule<It, record_struct(), locals<uint8_t>, space_type> g;
      
      g %= lexeme[+(alnum | '_')]
           >> uint_ [_a = _1] >> (uint_ | attr(_a))
           >> -("//" >> lexeme[+char_]);
      
      parsed = phrase_parse(iter, end, g, space, result);
      

      This requires

      • a qi::rule declaration to declare the qi::locals<uint8_t>; qi::_a is the placeholder for that local attribute
      • initialize the rule as an "auto-rule" (docs), i.e. with %= so that semantic actions do not overrule attribute propagation
    2. There's a wacky hybrid here where you don't actually use locals<> but just refer to an external variable; this is in general a bad idea but as your parser is not recursive/reentrant you could do it

      Live On Coliru

      parsed = phrase_parse(iter, end, (
                     lexeme[+(alnum | '_')]
                  >> uint_ [ phx::ref(dist_) = _1 ] >> (uint_ | attr(phx::ref(dist_)))
                  >> (("//" >> lexeme[+char_]) | attr(""))
              ), space, result);
      
    3. You could go full Boost Phoenix and juggle the values right from the semantic actions

      Live On Coliru

      parsed = phrase_parse(iter, end, (
                     lexeme[+(alnum | '_')]
                  >> uint_ >> (uint_ | attr(phx::at_c<1>(_val)))
                  >> (("//" >> lexeme[+char_]) | attr(""))
              ), space, result);
      
    4. You could parse into optional<uint8_t> and postprocess the information

      Live On Coliru

      std::string              name;
      uint8_t                  distance;
      boost::optional<uint8_t> travelDistance;
      std::string              comment;
      
      parsed = phrase_parse(iter, end, (
                     lexeme[+(alnum | '_')]
                  >> uint_ >> -uint_
                  >> -("//" >> lexeme[+char_])
              ), space, name, distance, travelDistance, comment);
      
      result = { name, distance, travelDistance? *travelDistance : distance, comment };
      

    Post Scriptum

    I noticed this a little late:

    If possible, I'd like to isolate the parsing of the two integers to a separate parser rule, without giving up using the fusion struct.

    Well, of course you can:

    rule<It, uint8_t(uint8_t)> def_uint8 = uint_parser<uint8_t>() | attr(_r1);
    

    This is at once more accurate, because it doesn't parse unsigned values that don't fit in a uint8_t. Mixing and matching from the above: Live On Coliru