Search code examples
c++boostboost-spiritboost-spirit-qi

Understanding the List Operator (%) in Boost.Spirit


Can you help me understand the difference between the a % b parser and its expanded a >> *(b >> a) form in Boost.Spirit? Even though the reference manual states that they are equivalent,

The list operator, a % b, is a binary operator that matches a list of one or more repetitions of a separated by occurrences of b. This is equivalent to a >> *(b >> a).

the following program produces different results depending on which is used:

#include <iostream>
#include <string>
#include <vector>

#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>

struct Record {
  int id;
  std::vector<int> values;
};

BOOST_FUSION_ADAPT_STRUCT(Record,
  (int, id)
  (std::vector<int>, values)
)

int main() {
  namespace qi = boost::spirit::qi;

  const auto str = std::string{"1: 2, 3, 4"};

  const auto rule1 = qi::int_ >> ':' >> (qi::int_ % ',')                 >> qi::eoi;
  const auto rule2 = qi::int_ >> ':' >> (qi::int_ >> *(',' >> qi::int_)) >> qi::eoi;

  Record record1;
  if (qi::phrase_parse(str.begin(), str.end(), rule1, qi::space, record1)) {
    std::cout << record1.id << ": ";
    for (const auto& value : record1.values) { std::cout << value << ", "; }
    std::cout << '\n';
  } else {
    std::cerr << "syntax error\n";
  }

  Record record2;
  if (qi::phrase_parse(str.begin(), str.end(), rule2, qi::space, record2)) {
    std::cout << record2.id << ": ";
    for (const auto& value : record2.values) { std::cout << value << ", "; }
    std::cout << '\n';
  } else {
    std::cerr << "syntax error\n";
  }
}

Live on Coliru

1: 2, 3, 4, 
1: 2, 

rule1 and rule2 are different only in that rule1 uses the list operator ((qi::int_ % ',')) and rule2 uses its expanded form ((qi::int_ >> *(',' >> qi::int_))). However, rule1 produced 1: 2, 3, 4, (as expected) and rule2 produced 1: 2,. I cannot understand the result of rule2: 1) why is it different from that of rule1 and 2) why were 3 and 4 not included in record2.values even though phrase_parse returned true somehow?


Solution

  • Update X3 version added

    First off, you fallen into a deep trap here:

    Qi rules don't work with auto. Use qi::copy or just used qi::rule<>. Your program has undefined behaviour and indeed it crashed for me (valgrind pointed out where the dangling references originated).

    So, first off:

    const auto rule = qi::copy(qi::int_ >> ':' >> (qi::int_ % ',')                 >> qi::eoi); 
    

    Now, when you delete the redundancy in the program, you get:

    Reproducing the problem

    Live On Coliru

    int main() {
        test(qi::copy(qi::int_ >> ':' >> (qi::int_ % ',')));
        test(qi::copy(qi::int_ >> ':' >> (qi::int_ >> *(',' >> qi::int_))));
    }
    

    Printing

    1: 2, 3, 4, 
    1: 2, 
    

    The cause and the fix

    What happened to 3, 4 which was successfully parsed?

    Well, the attribute propagation rules indicate that qi::int_ >> *(',' >> qi::int_) exposes a tuple<int, vector<int> >. In a bid to magically DoTheRightThing(TM) Spirit accidentally misfires and "assigngs" the int into the attribute reference, ignoring the remaining vector<int>.

    If you want to make container attributes parse as "an atomic group", use qi::as<>:

    test(qi::copy(qi::int_ >> ':' >> qi::as<Record::values_t>() [ qi::int_ >> *(',' >> qi::int_)]));
    

    Here as<> acts as a barrier for the attribute compatibility heuristics and the grammar knows what you meant:

    Live On Coliru

    #include <iostream>
    #include <string>
    #include <vector>
    
    #include <boost/fusion/include/adapt_struct.hpp>
    #include <boost/spirit/include/qi.hpp>
    
    struct Record {
      int id;
      using values_t = std::vector<int>;
      values_t values;
    };
    
    BOOST_FUSION_ADAPT_STRUCT(Record, id, values)
    
    namespace qi = boost::spirit::qi;
    
    template <typename T>
    void test(T const& rule) {
        const std::string str = "1: 2, 3, 4";
    
        Record record;
    
        if (qi::phrase_parse(str.begin(), str.end(), rule >> qi::eoi, qi::space, record)) {
            std::cout << record.id << ": ";
            for (const auto& value : record.values) { std::cout << value << ", "; }
            std::cout << '\n';
        } else {
            std::cerr << "syntax error\n";
        }
    }
    
    int main() {
        test(qi::copy(qi::int_ >> ':' >> (qi::int_ % ',')));
        test(qi::copy(qi::int_ >> ':' >> (qi::int_ >> *(',' >> qi::int_))));
        test(qi::copy(qi::int_ >> ':' >> qi::as<Record::values_t>() [ qi::int_ >> *(',' >> qi::int_)]));
    }
    

    Prints

    1: 2, 3, 4, 
    1: 2, 
    1: 2, 3, 4,