Search code examples
c++boost-spiritboost-spirit-qisemantic-actions

Discarding parsed result after semantic action


In Boost.Spirit one can read from a stream to a std::vector simply by doing:

#include<vector>
#include<boost/spirit/include/qi.hpp>
namespace sqi = boost::spirit::qi;
int main(){
        std::string const v_str = "AA BB CC";
        std::vector<std::string> v;
        auto it = begin(v_str);
        bool r = sqi::phrase_parse(it, end(v_str), 
                    (*sqi::lexeme[+sqi::char_("A-Z")]), sqi::space, v);
        assert( v.size() == 3  and v[2] == "CC" );
}

However, it happens that I know the number of elements in advance because of the input format and I should be able to prereserve the space in the vector. For example if the input string is "3 AA BB CC", one can allocate in advance three elements.

The question is how to pass this extra information to the vector and optimize the later push_back (e.g. avoiding reallocations).

What I tried was to parse an integer at the beginning at associate a semantic action to it where a reserve is executed.

        std::string const v_str = "3 AA BB CC";
        std::vector<std::string> v;
        auto it = begin(v_str);
        bool r = sqi::phrase_parse(it, end(v_str), 
             sqi::int_[([&](int i){v.reserve(i);})] >> 
                (*sqi::lexeme[+sqi::char_("A-Z")]), sqi::space, v);

The problem is that the integer is not ignored after the semantic action and from my tests I can see that it tries to push the result (the 3 in the example) into the vector ever after reserve.

Another workaround would be to add another argument to phrase_parse function but that seems to be an overkill.

So, how can I parse something in Boost.Spirit and only execute the semantic action without sending the result to the sink variable?

Even if this can be done I am not really sure if this is the right way to do it.


Solution

  • Thanks to the links I was pointed to by @sehe and @drus and finding about qi::omit, I realize I can associate a semantic action and then omit the result.

    The format I have to handle is redundant (the size is redundant with the number of elements), so I have to semantically omit something in any case.

        using namespace sqi;
        std::string const v_str = "3 AA BB CC";
        {
            std::vector<std::string> v;
            auto it = begin(v_str);
            bool r = sqi::phrase_parse(
                it, end(v_str), 
                omit[int_] >> *lexeme[+(char_-' ')],
                space, v
            );
            assert( v.size() == 3 and v[2] == "CC" );
        }
    

    But doesn't mean that I cannot use the omitted (redundant) part for optimization purposes or consistency check.

        {
            std::vector<std::string> v;
            auto it = begin(v_str);
            bool r = sqi::phrase_parse(
                it, end(v_str), 
                omit[int_[([&](int n){v.reserve(n);})]] >> *lexeme[+(char_-' ')],
                space, v
            );
            assert( v.size() == 3 and v[2] == "CC" );
        }
    

    I agree that semantic actions are evil, but in my opinion only when they change the state of the sink objects. One can argue that reserve does not change the state of the vector.

    In fact, this way I can optimize memory usage by reserve and also the parser execution by using repeat instead of the unbounded kleene*. (Apparently repeat can be more efficient).

        {
            std::vector<std::string> v;
            auto it = begin(v_str);
            int n;
            bool r = sqi::phrase_parse(
                it, end(v_str), 
                omit[int_[([&](int nn){v.reserve(n = nn);})]] >> repeat(phx::ref(n))[lexeme[+(char_-' ')]],
                space, v
            );
            assert( n == v.size() and v.size() == 3 and v[2] == "CC" );
        }
    

    (unsing phx::ref is fundamental because the evaluation of n has to be delayed)