As the question implies I have a code snippet, with QRegularExpression, which works. It does what it is supposed to do, causes no errors and everything is fine.
Why am I posting the question? Well everything that I found so far implies that my expression should not work, but..... it does.
The main point of my question lies in the \-
escape sybmol.
I know know that it's not defined. And during compiling i get
warning: unknown escape sequence: '\-'
. And this warning is actually expected.
Now consider the following code snippet. Don't pay too much attention to the expression, it is russian, but unfortunatelly i noticed this strange thing on this expression.
I am not posting anything else because as stange as it sounds - it works as desired.
I actually want to understand why - considering i get the warning.
The expression is below.
//Capture russian endings
QRegularExpression RU_ENDINGS("([а-я\-]+[бвгджзклмнпрстфхчцшщ])([еиоы][й]|[аия][я]|[иую][ю]|[еиоы][е]|[аоеиы][м][иу]|[ое][г][о]|(?<!ост)и?[аеиоыя]м|ост[а-яё]{1,3}|(?<!остиям)(?>и|ь.?)|[ао]в|н[аеио]|с[ая]|[ео][вк]|[иы]х|[ие]ну|[иуя]т|(?<![аеёиоуыэюя]{2})[аеёоуыэюя]+|и{2})$", QRegularExpression::UseUnicodePropertiesOption | QRegularExpression::MultilineOption);
As i said i get desired behavior.
In russian words with the symbol '-' in them, the symbol is actually is gobbled up by the [а-я\-]+
part. If it is not there - the -
is not gobbled up.
Everything i found suggest it should not work, but it does.
UPDATE
In the suggested duplicate Regex did not work.
My question clearly states that my regex works, I just could not figure out why it did work as desired, considering the warning I got during compilation. All the provided code was used as it is and worked.
More to the point the question has nothing to do with std::regex, also a correct answer was already given below to the question with the correct explanation.
The question might be a duplicate, but it certainly is not the duplicate of the suggested question.
The compiler doesn't know the escape sequence \-
. So it just puts a simple -
in the string and issues a warning.
Your regex engine thus sees [а-я-]
. And the way regex character groups work, a -
at the very end of the group is not special, i.e. there is no difference between [а-я\-]
and [а-я-]
.
Thus, the expression works as you want it to.
You can try this out for yourself by making a small program that compares the results for these two expressions. I.e.
QRegularExpression escaped("[a-z\\-]");
QRegularExpression bad_escaped("[a-z\-]");
QRegularExpression unescaped("[a-z-]");
Match these three against a few test strings, in particular the string "-"
, and you'll find that they all behave the same. Except for the compiler warning of course.