I've seen a few comments here that mention that modern regular expressions go beyond what can be represented in a regular language. How is this so?
What features of modern regular expressions are not regular? Examples would be helpful.
The first thing that comes to mind is backreferences:
(\w*)\s\1
(matches a group of word characters, followed by a space character and then the same group previously matched) eg: hello hello
matches, hello world
doesn't.
This construct is not regular (ie: can't be generated by a regular grammar).
Another feature supported by Perl Compatible RegExp (PCRE) that is not regular are recursive patterns:
\((a*|(?R))*\)
This can be used to match any combination of balanced parentheses and "a"s (from wikipedia)