Search code examples
c++c++11c++14boost-spiritboost-spirit-x3

Empty strings in vector returned from boost spirit x3 parser


I want to check a file for all enums(this is just an MCVE so nothing complicated) and the name of the enums should be stored in an std::vector I build my parsers like this:

auto const any = x3::rule<class any_id, const x3::unused_type>{"any"}
               = ~x3::space;

auto const identifier = x3::rule<class identifier_id, std::string>{"identifier"}
                      = x3::lexeme[x3::char_("A-Za-z_") >> *x3::char_("A-Za-z_0-9")];

auto const enum_finder = x3::rule<class enum_finder_id, std::vector<std::string>>{"enum_finder"}
                       = *(("enum" >> identifier) | any);

When I am trying to parse a string with this enum_finder into a std::vector, the std::vector also contains a lot of empty string. Why is this parser also parsing empty strings into the vector?


Solution

  • I've assumed you want to parse "enum " out of free form text ignoring whitespaces.

    What you really want is for ("enum" >> identifier | any) to synthesize an optional<string>. Sadly, what you get is variant<string, unused_type> or somesuch.

    The same happens when you wrap any with x3::omit[any] - it's still the same unused_type.

    Plan B: Since you're really just parsing repeated enum-ids separated by "anything", why not use the list operator:

         ("enum" >> identifier) % any
    

    This works a little. Now some tweaking: lets avoid eating "any" character by character. In fact, we can likely just consume whole whitespace delimited words: (note +~space is equivalent +graph):

    auto const any = x3::rule<class any_id>{"any"}
                   = x3::lexeme [+x3::graph];
    

    Next, to allow for multiple bogus words to be accepted in a row there's the trick to make the list's subject parser optional:

           -("enum" >> identifier) % any;
    

    This parses correctly. See a full demo:

    DEMO

    Live On Coliru

    #include <boost/spirit/home/x3.hpp>
    namespace x3 = boost::spirit::x3;
    
    namespace parser {
        using namespace x3;
        auto any         = lexeme [+~space];
        auto identifier  = lexeme [char_("A-Za-z_") >> *char_("A-Za-z_0-9")];
        auto enum_finder = -("enum" >> identifier) % any;
    }
    
    #include <iostream>
    int main() {
    
        for (std::string input : {
                "",
                "  ",
                "bogus",
                "enum one",
                "enum one enum two",
                "enum one bogus bogus more bogus enum two !@#!@#Yay",
            })
        {
            auto f = input.begin(), l = input.end();
            std::cout << "------------ parsing '" << input << "'\n";
    
            std::vector<std::string> data;
            if (phrase_parse(f, l, parser::enum_finder, x3::space, data))
            {
                std::cout << "parsed " << data.size() << " elements:\n";
                for (auto& el : data)
                    std::cout << "\t" << el << "\n";
            } else {
                std::cout << "Parse failure\n";
            }
    
            if (f!=l)
                std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
        }
    
    }
    

    Prints:

    ------------ parsing ''
    parsed 0 elements:
    ------------ parsing '  '
    parsed 0 elements:
    ------------ parsing 'bogus'
    parsed 0 elements:
    ------------ parsing 'enum one'
    parsed 1 elements:
        one
    ------------ parsing 'enum one enum two'
    parsed 1 elements:
        one
    ------------ parsing 'enum one bogus bogus more bogus enum two !@#!@#Yay'
    parsed 2 elements:
        one
        two