Let's say we have a regex "start:(?: ([0-9]{1,2}))? ([0-9].*)".
It will match
std::string string1 = "start: 01 0ab";
and
std::string string2 = "start: 0ab";
We can also get the 2 matched string respectively.
I try to use boost::spirit::qi parser to parse string2 but it couldn't match.
qi::rule<std::string::const_iterator, std::string()> rule1 = qi::repeat(1,2)[qi::digit];
qi::rule<std::string::const_iterator, std::string()> rule2 = qi::digit >> *qi::char_;
std::vector<std::string> attr;
auto it_begin = string2.begin();
auto it_end = string2.end();
if (qi::parse(
it_begin,
it_end,
qi::lit("start:")
>> -(qi::lit(" ") >> rule1)
>> qi::lit(" ") >> rule2
>> qi::eoi,
attr))
std::cout<<"match"<<std::endl;
else
std::cout<<"not match"<<std::endl;
We can of course use a look-ahead operator to check what's behind rule1, but is there a more generic approach to implement regex operator '?' ? Thanks!
I'm not sure what's wrong with the expectation. It is the only way for otherwise ambiguous rules, since PEG grammars are always greedy.
However, maybe you didn't arrive at the most elegant form, since you were looking for something "better". Here's what I'd do.
I'd use a skipper to match spaces¹:
if (qi::phrase_parse(it_begin, it_end,
"start:" >> -rule1 >> rule2 >> qi::eoi,
qi::space, attr))
Where the rules are still lexemes (because there were declared without the skipper):
qi::rule<It, std::string()> const
rule1 = qi::digit >> qi::digit >> &qi::space,
rule2 = qi::digit >> *qi::graph;
Note
qi::graph
doesn't match whitespace, where*qi::char_
simply matches anything at all greedily.
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
int main() {
using It = std::string::const_iterator;
// implicitly lexemes (no skipper in rule declaration)
qi::rule<It, std::string()> const
rule1 = qi::digit >> qi::digit >> &qi::space,
rule2 = qi::digit >> *qi::graph;
for (std::string const input : { "start: 01 0ab", "start: 0ab", }) {
std::vector<std::string> attr;
auto it_begin = input.begin();
auto it_end = input.end();
if (qi::phrase_parse(it_begin, it_end, "start:" >> -rule1 >> rule2 >> qi::eoi, qi::space, attr))
std::cout << "match\n";
else
std::cout << "not match\n";
if (it_begin!=it_end)
std::cout<<"Remaining unparsed input: '" << std::string(it_begin, it_end) << "'\n";
}
}
Prints
match
match
¹ this assumes that multiple/different whitespace is okay. If newlines should not count as whitespace, use qi::blank
instead of qi::space