By 'basic', I mean not using these regex syntax / features:
If I only use "capturing groups" along with those meta-characters (e.g. \s
, \d
), can I get all the job done?
Suppose we wish to match a string that begins 'dog'
and ends 'cat'
that does not contain 'pig'
. We can do that with a regular expression that employs the tempered greedy token technique, which employs a negative lookahead:
dog(?:(?!pig).)*cat
Demo.
While it is a logical impossibility to prove that this task cannot be done with a regular expression that does not contain a lookaround, I am not aware of such an expression. Readers are invited to suggest one that does.
This regular expression can be broken down as follows.
dog # match 'dog'
(?: # begin a non-capture group
(?! # begin a negative lookahead
pig # match 'pig'
) # end the negative lookahead
. # match any character other than a line terminator
) # end the non-capture group
* # execute the preceding non-capture group zero or more times
cat # match 'cat'
Come to think of it, one could use an alternation with all permutations of all numbers (up to the the string size less 6) of all permissible characters that forms a string that does not contain 'pig'
: dogcat|dogacat|dogbcat| ... |dogaabcat|dogabacat| ... |dogaZrnpiht0xcat|...
but with longer strings this could be somewhat demanding computationally.