Search code examples
regexstringpumping-lemmacontext-free-language

Is there a regular expression that return substrings from a string, that do not match a given list of specific substrings?


Hi I am wondering if there is a regular expression that can do the following:

Select all the substrings from a string that :

  • start with & and
  • have n number of characters after the & (n >= 0)

AND those substrings are NOT

  • &
  • '
  • <
  • > or
  • "

For example, given the string

'Stewie & Brian    &partners in crime;'

is there a regex that will return only the substring &partners ?

My intuition says no , because I need a context free grammar but how can I prove that? Is there a regex to test it with the pumping lemma ?

Or a regex actually exists and my intuition is just wrong?

Thank you


Solution

  • Sure:

    &(?!(amp|apos|lt|gt);)\S{4,}
    

    for n=4

    See live demo.

    The key here is the negative look ahead (?!(amp|apos|lt|gt);), which asserts (without consuming input) that the input immediately following does not match (amp|apos|lt|gt);