Search code examples
c++parsingboost-spirit-qi

boost spirit parser : getting around the greedy kleene *


I have a grammar that should match sequence of characters followed a single character that is subset of the first one. For example,

boost::spirit::qi::rule<Iterator, std::string()> grammar = *char_('a', 'z') >> char_('b', 'z').

Since the kleene * is greedy operator it gobbles up everything leaving nothing for the second parser, so it fails to match strings like "abcd"

Is there any way to get around this?


Solution

  • Yes, though your sample lacks context for us to know it.

    We need to know what constitutes a complete match, because right now "b" would be a valid match, and "bb" or "bbb". So when the input is "bbb", what is going to be the match? (b, bb or bbb?).

    And when you will answer (likely) "Obviously, bbb", then what happens for "bbbb"? When do you stop accepting chars from the subset? If you want the kleene star to not be greedy, do you want it to still be greedy?

    The above dialog is annoying but the goal is to make you THINK about what you need. You do not need a non-greedy kleene-star. You probably want a validation constraint on the last char. Most likely, if the input has "bbba" you do not want to simply match "bbb", leaving "a". Instead you likely want to stop parsing because "bbba" is not a valid token.

    Assuming That...

    I'd write

    grammar = +char_("a-z") >> eps(px::back(_val) != 'a');
    

    Meaning that we accept at least 1 char as long as it matches, asserting that the last character wasn't the a.

    Live On Coliru

    #include <boost/spirit/include/qi.hpp>
    #include <boost/spirit/include/phoenix.hpp>
    #include <boost/spirit/include/phoenix_stl.hpp>
    
    namespace qi = boost::spirit::qi;
    namespace px = boost::phoenix;
    
    template <typename It>
    struct P : qi::grammar<It, std::string()>
    {
        P() : P::base_type(start) {
            using namespace qi;
            start = +char_("a-z") >> eps(px::back(_val) != 'a');
        }
      private:
        qi::rule<It, std::string()> start;
    };
    
    #include <iomanip>
    
    int main() {
        using It = std::string::const_iterator;
        P<It> const p;
    
        for (std::string const input : { "", "b", "bb", "bbb", "aaab", "a", "bbba" }) {
            std::cout << std::quoted(input) << ": ";
            std::string out;
            It f = input.begin(), l = input.end();
            if (parse(f, l, p, out)) {
                std::cout << std::quoted(out);
            } else {
                std::cout << "(failed) ";
            }
    
            if (f != l)
                std::cout << " Remaining: " << std::quoted(std::string(f,l));
            std::cout << "\n";
        }
    }
    

    Prints

    "": (failed) 
    "b": "b"
    "bb": "bb"
    "bbb": "bbb"
    "aaab": "aaab"
    "a": (failed)  Remaining: "a"
    "bbba": (failed)  Remaining: "bbba"
    

    BONUS

    A more generic, albeit less efficient, approach would be to match the leading characters with a look-ahead assertion that it isn't the last character of its kind:

    start = *(char_("a-z") >> &char_("a-z")) >> char_("b-z");
    

    A benefit here is that no Phoenix usage is required:

    Live On Coliru

    #include <boost/spirit/include/qi.hpp>
    
    namespace qi = boost::spirit::qi;
    
    template <typename It>
    struct P : qi::grammar<It, std::string()>
    {
        P() : P::base_type(start) {
            using namespace qi;
            start = *(char_("a-z") >> &char_("a-z")) >> char_("b-z");
        }
      private:
        qi::rule<It, std::string()> start;
    };
    
    #include <iomanip>
    
    int main() {
        using It = std::string::const_iterator;
        P<It> const p;
    
        for (std::string const input : { "", "b", "bb", "bbb", "aaab", "a", "bbba" }) {
            std::cout << std::quoted(input) << ": ";
            std::string out;
            It f = input.begin(), l = input.end();
            if (parse(f, l, p, out)) {
                std::cout << std::quoted(out);
            } else {
                std::cout << "(failed) ";
            }
    
            if (f != l)
                std::cout << " Remaining: " << std::quoted(std::string(f,l));
            std::cout << "\n";
        }
    }