Search code examples
c++parsingboost-spiritboost-spirit-qi

Parsing characters into an std::map<char,int> using boost::qi


I am trying to parse a sequence of characters separated by a "," into an std::map<char,int> of pairs where the key is the character and the value just the a count of parsed characters. For example, if the input is

 a,b,c

The map should contain the pairs:

(a,1) , (b,2) , (c,3) 

Here's the code I am using :

namespace myparser
{
    std::map<int, std::string> mapping;
    namespace qi = boost::spirit::qi;
    namespace ascii = boost::spirit::ascii;
    namespace phoenix = boost::phoenix;
    int i = 0;

    template <typename Iterator>
    bool parse_numbers(Iterator first, Iterator last, std::map<char,int>& v)
    {
        using qi::double_;
        using qi::char_;
        using qi::phrase_parse;
        using qi::_1;
        using ascii::space;
        using phoenix::push_back;

        bool r = phrase_parse(first, last,

            //  Begin grammar
            (

               
                char_[v.insert(std::make_pair(_1,0)]
                    >> *(',' >> char_[v.insert(std::make_pair(_1,0)])
            )
            ,
            //  End grammar

            space);

        if (first != last) // fail if we did not get a full match
            return false;
        return r;
    }
    //]
}

Then I try to print the pair in main like this:

int main() {
  std::string str;
    while (getline(std::cin, str))
    {
        if (str.empty() || str[0] == 'q' || str[0] == 'Q')
            break;

        std::map<char,int> v;
        std::map<std::string, int>::iterator it = v.begin();
        if (myparser::parse_numbers(str.begin(), str.end(), v))
        {
            std::cout << "-------------------------\n";
            std::cout << "Parsing succeeded\n";
            std::cout << str << " Parses OK: " << std::endl;

        while (it != v.end())
        {
        // Accessing KEY from element pointed by it.
        std::string word = it->first;
        // Accessing VALUE from element pointed by it.
        int count = it->second;
        std::cout << word << " :: " << count << std::endl;
        // Increment the Iterator to point to next entry
        it++;
         }

            std::cout << "\n-------------------------\n";
        }
        else
        {
            std::cout << "-------------------------\n";
            std::cout << "Parsing failed\n";
            std::cout << "-------------------------\n";
        }
    }
return 0;
}

I am a beginner and I don't know how to fix this code . I also want to use strings instead of characters so I enter a sequence of strings separated by a "," and store them in a map similar to the one mentioned above. I would appreciate any help !


Solution

  • You cannot use Phoenix place holders outside Phoenix deferred actors. E.g. the type of std::make_pair(qi::_1, 0) is std::pair<boost::phoenix::actor<boost::phoenix::argument<0>>, int>.

    Nothing interoperates with such a thing. Certainly not std::map<>::insert.

    What you'd need to do is wrap all the operations in semantic actions as Phoenix actors.

    #include <boost/phoenix.hpp>
    namespace px = boost::phoenix;
    

    Then you can:

    #include <boost/phoenix.hpp>
    #include <boost/spirit/include/qi.hpp>
    
    namespace qi = boost::spirit::qi;
    namespace px = boost::phoenix;
    
    namespace myparser {
        using Map = std::map<char, int>;
    
        template <typename Iterator>
        bool parse_numbers(Iterator first, Iterator last, Map& m) {
            auto action = px::insert(px::ref(m), px::end(px::ref(m)),
                                     px::construct<std::pair<char, int>>(qi::_1, 0));
    
            bool r = qi::phrase_parse( //
                first, last,
                //  Begin grammar
                qi::char_[action] >> *(',' >> qi::char_[action]),
                //  End grammar
                qi::space);
    
            return r && first == last;
        }
    } // namespace myparser
    

    See it Live

    Easy peasy. Right.

    I spent half an hour on that thing debugging why it wouldn't work. Why is this so hard?

    It's because someone invented a whole meta-DSL to write "normal C++" but with defferred execution. Back when that happened it was pretty neat, but it is the mother of all leaky abstractions, with razor sharp edges.

    So, what's new? Using C++11 you could:

    Live

    template <typename Iterator>
    bool parse_numbers(Iterator first, Iterator last, Map& m) {
        struct action_f {
            Map& m_;
            void operator()(char ch) const { m_.emplace(ch, 0); }
        };
        px::function<action_f> action{{m}};
    
        bool r = qi::phrase_parse( //
            first, last,
            //  Begin grammar
            qi::char_[action(qi::_1)] >> *(',' >> qi::char_[action(qi::_1)]),
            //  End grammar
            qi::space);
    
        return r && first == last;
    }
    

    Or using c++17:

    Live

    template <typename Iterator>
    bool parse_numbers(Iterator first, Iterator last, Map& m) {
        px::function action{[&m](char ch) { m.emplace(ch, 0); }};
    
        bool r = qi::phrase_parse( //
            first, last,
            //  Begin grammar
            qi::char_[action(qi::_1)] >> *(',' >> qi::char_[action(qi::_1)]),
            //  End grammar
            qi::space);
    
        return r && first == last;
    }
    

    On a tangent, you probably wanted to count things, so, maybe use

    Live

    px::function action{[&m](char ch) { m[ch] += 1; }};
    

    By this time, you could switch to Spirit X3 (which requires C++14):

    Live

    #include <boost/spirit/home/x3.hpp>
    #include <map>
    
    namespace x3 = boost::spirit::x3;
    
    namespace myparser {
        using Map = std::map<char, int>;
    
        template <typename Iterator>
        bool parse_numbers(Iterator first, Iterator last, Map& m) {
            auto action = [&m](auto& ctx) { m[_attr(ctx)] += 1; };
    
            return x3::phrase_parse( //
                    first, last,
                    //  Begin grammar
                    x3::char_[action] >> *(',' >> x3::char_[action]) >> x3::eoi,
                    //  End grammar
                    x3::space);
        }
    } // namespace myparser
    

    Now finally, let's simplify. p >> *(',' >> p) is just a clumsy way of saying p % ',':

    Live

    template <typename Iterator>
    bool parse_numbers(Iterator first, Iterator last, Map& m) {
        auto action = [&m](auto& ctx) { m[_attr(ctx)] += 1; };
    
        return x3::phrase_parse(     //
            first, last,             //
            x3::char_[action] % ',', //
            x3::space);
    }
    

    And you wanted words, not characters:

    Live

    #include <boost/spirit/home/x3.hpp>
    #include <map>
    
    namespace x3 = boost::spirit::x3;
    
    namespace myparser {
        using Map = std::map<std::string, int>;
    
        template <typename Iterator>
        bool parse_numbers(Iterator first, Iterator last, Map& m) {
            auto action = [&m](auto& ctx) { m[_attr(ctx)] += 1; };
    
            auto word_ = (*~x3::char_(','))[action];
    
            return phrase_parse(first, last, word_ % ',', x3::space);
        }
    } // namespace myparser
    
    #include <iomanip>
    #include <iostream>
    
    int main() {
        for (std::string const str : {"foo,c++ is strange,bar,qux,foo,c++       is strange   ,cuz"}) {
            std::map<std::string, int> m;
    
            std::cout << "Parsing " << std::quoted(str) << std::endl;
    
            if (myparser::parse_numbers(str.begin(), str.end(), m)) {
                std::cout << m.size() << " words:\n";
                for (auto& [word,count]: m)
                    std::cout << " - " << std::quoted(word) << " :: " << count << std::endl;
            } else {
                std::cerr << "Parsing failed\n";
            }
        }
    }
    

    Prints

    Parsing "foo,c++ is strange,bar,qux,foo,c++       is strange   ,cuz"
    5 words:
     - "bar" :: 1
     - "c++isstrange" :: 2
     - "cuz" :: 1
     - "foo" :: 2
     - "qux" :: 1
    

    Note the behaviour of the x3::space (like qi::space and qi::ascii::space above).