Search code examples
c++regexstringreversereverse-iterator

How Can I Use a Regex on the Reverse of a string?


I want to use a regex on the reverse of a string.

I can do the following but all my sub_matches are reversed:

string foo("lorem ipsum");
match_results<string::reverse_iterator> sm;

if (regex_match(foo.rbegin(), foo.rend(), sm, regex("(\\w+)\\s+(\\w+)"))) {
    cout << sm[1] << ' ' << sm[2] << endl;
}
else {
    cout << "bad\n";
}

[Live example]

What I want is to get out:

ipsum lorem

Is there any provision for getting the sub-matches that are not reversed? That is, any provision beyond reversing the strings after they're matched like this:

string first(sm[1]);
string second(sm[2]);

reverse(first.begin(), first.end());
reverse(second.begin(), second.end());

cout << first << ' ' << second << endl;

EDIT:

It has been suggested that I update the question to clarify what I want:

Running the regex backwards on the string is not about reversing the order that the matches are found in. The regex is far more complex that would be valuable to post here, but running it backwards saves me from needing a look ahead. This question is about the handling of sub-matches obtained from a match_results<string::reverse_iterator>. I need to be able to get them out as they were in the input, here foo. I don't want to have to construct a temporary string and run reverse on it for each sub-match. How can I avoid doing this.


Solution

  • This is absolutely possible! The key is in the fact that a sub_match inherits from pair<BidirIt, BidirIt>. Since sub_matches will be obtained from: match_results<string::reverse_iterator> sm, the elements of the pair a sub_match inherits from will be string::reverse_iterators.

    So for any given sub_match from sm you can get the forward range from it's second.base() to it's first.base(). You don't have to construct strings to stream ranges but you will need to construct an ostream_iterator:

    ostream_iterator<char> output(cout);
    
    copy(sm[1].second.base(), sm[1].first.base(), output);
    output = ' ';
    copy(sm[2].second.base(), sm[2].first.base(), output);
    

    Take heart though, there is a better solution on the horizon! This answer discusses string_literals as of right now no action has been taken on them, but they have made it into the "Evolution Subgroup".