Search code examples
regexregular-language

Aren't modern regular expression dialects regular?


I've seen a few comments here that mention that modern regular expressions go beyond what can be represented in a regular language. How is this so?

What features of modern regular expressions are not regular? Examples would be helpful.


Solution

  • The first thing that comes to mind is backreferences:

    (\w*)\s\1
    

    (matches a group of word characters, followed by a space character and then the same group previously matched) eg: hello hello matches, hello world doesn't.

    This construct is not regular (ie: can't be generated by a regular grammar).


    Another feature supported by Perl Compatible RegExp (PCRE) that is not regular are recursive patterns:

    \((a*|(?R))*\)
    

    This can be used to match any combination of balanced parentheses and "a"s (from wikipedia)