Search code examples
c++qtqregularexpression

Can't find any explanation for QRegularExpression behavior. It works, but it shouldn't


As the question implies I have a code snippet, with QRegularExpression, which works. It does what it is supposed to do, causes no errors and everything is fine.

Why am I posting the question? Well everything that I found so far implies that my expression should not work, but..... it does.

The main point of my question lies in the \- escape sybmol.

I know know that it's not defined. And during compiling i get warning: unknown escape sequence: '\-'. And this warning is actually expected.

Now consider the following code snippet. Don't pay too much attention to the expression, it is russian, but unfortunatelly i noticed this strange thing on this expression.

I am not posting anything else because as stange as it sounds - it works as desired.

I actually want to understand why - considering i get the warning.

The expression is below.

//Capture russian endings
QRegularExpression RU_ENDINGS("([а-я\-]+[бвгджзклмнпрстфхчцшщ])([еиоы][й]|[аия][я]|[иую][ю]|[еиоы][е]|[аоеиы][м][иу]|[ое][г][о]|(?<!ост)и?[аеиоыя]м|ост[а-яё]{1,3}|(?<!остиям)(?>и|ь.?)|[ао]в|н[аеио]|с[ая]|[ео][вк]|[иы]х|[ие]ну|[иуя]т|(?<![аеёиоуыэюя]{2})[аеёоуыэюя]+|и{2})$", QRegularExpression::UseUnicodePropertiesOption | QRegularExpression::MultilineOption);

As i said i get desired behavior. In russian words with the symbol '-' in them, the symbol is actually is gobbled up by the [а-я\-]+ part. If it is not there - the - is not gobbled up.

Everything i found suggest it should not work, but it does.

UPDATE

In the suggested duplicate Regex did not work.

My question clearly states that my regex works, I just could not figure out why it did work as desired, considering the warning I got during compilation. All the provided code was used as it is and worked.

More to the point the question has nothing to do with std::regex, also a correct answer was already given below to the question with the correct explanation.

The question might be a duplicate, but it certainly is not the duplicate of the suggested question.


Solution

  • The compiler doesn't know the escape sequence \-. So it just puts a simple - in the string and issues a warning.

    Your regex engine thus sees [а-я-]. And the way regex character groups work, a - at the very end of the group is not special, i.e. there is no difference between [а-я\-] and [а-я-].

    Thus, the expression works as you want it to.

    You can try this out for yourself by making a small program that compares the results for these two expressions. I.e.

    QRegularExpression escaped("[a-z\\-]");
    QRegularExpression bad_escaped("[a-z\-]");
    QRegularExpression unescaped("[a-z-]");
    

    Match these three against a few test strings, in particular the string "-", and you'll find that they all behave the same. Except for the compiler warning of course.