Search code examples
c++boostboost-spirit-qi

Boost spirit grammar rule to extract just alphanumeric tokens


I have a lexeme as below for words which are alhanumeric.

attributes = lexeme[+(boost::spirit::qi::alpha|boost::spirit::qi::digit)];

I want to have a grammar rule which skips any other characters which are not appropiate for this rule and just put these ones in a vector.

For example : input: STR1 + STR2 % STR3 () STR4 = STR5+ STR6

           output: (STR1, STR2, STR3, STR4, STR6)

I have tried below grammar but it skips everything after taking the first word in the parsing string. How can I change it to parse as I described?

typedef std::vector<std::wstring> Attributes;
template <typename It, typename Skipper=boost::spirit::qi::space_type>
struct AttributeParser : boost::spirit::qi::grammar<It, Attributes(),  Skipper>
{
    AttributeParser() : AttributeParser::base_type(expression)
    {
        expression = 

            *( attributes [phx::push_back(qi::_val, qi::_1)])
            >> qi::omit[*qi:char_]
            ;

        attributes = qi::lexeme[+(boost::spirit::qi::alpha|qi::boost::spirit::qi::digit)];

        BOOST_SPIRIT_DEBUG_NODE(expression);
        BOOST_SPIRIT_DEBUG_NODE(attributes);
    }


private:
    boost::spirit::qi::rule<It, std::wstring() , Skipper> attributes;
    boost::spirit::qi::rule<It, Attributes() , Skipper> expression;

};

Solution

  • I'd literally write what you describe:

        std::vector<std::wstring> parsed;
        bool ok = qi::phrase_parse(
                begin(input), end(input),
                *qi::lexeme [ +qi::alnum ],
                ~qi::alnum,
                parsed);
    

    Namely:

    • parse (partial) input
    • parsing lexemes of alpha-numerics
    • skipping anything non-alphanumeric
    • put the result into the parsed vector

    Here's the full program

    #include <boost/spirit/include/qi.hpp>
    
    namespace qi = boost::spirit::qi;
    
    int main()
    {
        std::wstring input = L"STR1 + STR2 % STR3 () STR4 = STR5+ STR6";
    
        std::vector<std::wstring> parsed;
        bool ok = qi::phrase_parse(begin(input), end(input),
                *qi::lexeme [ +qi::alnum ],
                ~qi::alnum,
                parsed);
    
        for(auto& v : parsed)
            std::wcout << v << std::endl;
    }
    

    That prints

    STR1
    STR2
    STR3
    STR4
    STR5
    STR6