Search code examples
c++boostboost-spiritboost-spirit-qi

Parsing two vectors of strings using boost:qi


I am new to using qi, and have run into a difficulty. I wish to parse an input like:

X + Y + Z , A + B

Into two vectors of strings.

I have code does this, but only if the grammar parses single characters. Ideally, the following line should be readable:

Xi + Ye + Zou , Ao + Bi

Using a simple replacement such as elem = +(char_ - '+') % '+' fails to parse, because it will consume the ',' on the first elem, but I've not discovered a simple way around this.

Here is my single-character code, for reference:

#include <bits/stdc++.h>

#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>

namespace qi = boost::spirit::qi;
namespace phx = boost::phoenix;

typedef std::vector<std::string> element_array;

struct reaction_t
{
  element_array reactants;
  element_array products;
};

BOOST_FUSION_ADAPT_STRUCT(reaction_t, (element_array, reactants)(element_array, products))

template<typename Iterator>
struct reaction_parser : qi::grammar<Iterator,reaction_t(),qi::blank_type>
 {
    reaction_parser() : reaction_parser::base_type(reaction)
    {
        using namespace qi;

    elem = char_ % '+';
    reaction = elem >> ',' >> elem;

    BOOST_SPIRIT_DEBUG_NODES((reaction)(elem));
    }
    qi::rule<Iterator, reaction_t(), qi::blank_type> reaction;
    qi::rule<Iterator, element_array(), qi::blank_type> elem;
};
int main()
{

    const std::string input = "X + Y + Z, A + B";
    auto f = begin(input), l = end(input);

    reaction_parser<std::string::const_iterator> p;
    reaction_t data;

    bool ok = qi::phrase_parse(f, l, p, qi::blank, data);

    if (ok) std::cout << "success\n";
    else    std::cout << "failed\n";

    if (f!=l)
        std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
}

Solution

  • Using a simple replacement such as elem = +(char_ - '+') % '+' fails to parse, because it will consume the ',' on the first elem, but I've not discovered a simple way around this.

    Well, the complete (braindead) simple solution would be to use +(char_ - '+' - ',') or +~char_("+,").

    Really, though, I'd make the rule for element more specific, e.g.:

        elem     = qi::lexeme [ +alpha ] % '+';
    

    See Boost spirit skipper issues about lexeme and skippers

    Live On Coliru

    #include <boost/fusion/adapted.hpp>
    #include <boost/spirit/include/qi.hpp>
    #include <boost/spirit/include/phoenix.hpp>
    
    namespace qi = boost::spirit::qi;
    namespace phx = boost::phoenix;
    
    typedef std::vector<std::string> element_array;
    
    struct reaction_t
    {
        element_array reactants;
        element_array products;
    };
    
    BOOST_FUSION_ADAPT_STRUCT(reaction_t, (element_array, reactants)(element_array, products))
    
    template<typename Iterator>
    struct reaction_parser : qi::grammar<Iterator,reaction_t(),qi::blank_type>
    {
        reaction_parser() : reaction_parser::base_type(reaction) {
            using namespace qi;
    
            elem     = qi::lexeme [ +alpha ] % '+';
            reaction = elem >> ',' >> elem;
    
            BOOST_SPIRIT_DEBUG_NODES((reaction)(elem));
        }
        qi::rule<Iterator, reaction_t(), qi::blank_type> reaction;
        qi::rule<Iterator, element_array(), qi::blank_type> elem;
    };
    
    int main()
    {
        reaction_parser<std::string::const_iterator> p;
    
        for (std::string const input : {
                "X + Y + Z, A + B",
                "Xi + Ye + Zou , Ao + Bi",
                })
        {
            std::cout << "----- " << input << "\n";
            auto f = begin(input), l = end(input);
    
            reaction_t data;
    
            bool ok = qi::phrase_parse(f, l, p, qi::blank, data);
    
            if (ok) {
                std::cout << "success\n";
                for (auto r : data.reactants) { std::cout << "reactant: " << r << "\n"; }
                for (auto p : data.products)  { std::cout << "product:  " << p << "\n"; }
            }
            else
                std::cout << "failed\n";
    
            if (f != l)
                std::cout << "Remaining unparsed: '" << std::string(f, l) << "'\n";
        }
    }
    

    Printing:

    ----- X + Y + Z, A + B
    success
    reactant: X
    reactant: Y
    reactant: Z
    product:  A
    product:  B
    ----- Xi + Ye + Zou , Ao + Bi
    success
    reactant: Xi
    reactant: Ye
    reactant: Zou
    product:  Ao
    product:  Bi