Search code examples
pythonregexnegative-lookbehind

Equivalent to (.*) in negative look behind assertion Regex Python


I am writing a negative lookbehind assertion expression in Python which performs the following function to parse a plain text file:

Does not match anything followed after http://********** ; but will match the pattern when it is not inside a http://* link

Example:
http://www.test.com/aa4   cd6
bx2 vq9 
yu9 http://www.bh9.com/cj3

Matches: cd6,bx2,vq9 and yu9

So I tried regexps like

r'(?<!http://(.*))([a-z][a-z][0-9])'
r'(?<!http://*)([a-z][a-z][0-9])'

They did not work.

How to add .* or do similar opearion inside negative look behind assertion regex in Python.


Solution

  • Problem: Lookbehind does not allow pattern whose length is not fixed.

    Quick hack: Perhaps the following regexp does the job?

    r'(?<![./])[a-z][a-z][0-9]'
    

    It works like this:

    >>> str = """http://www.test.com/aa4
    ... bx2 vq9 
    ... http://www.bh9.com/cj3
    ... """
    >>> re.findall(r'(?<![./])[a-z][a-z][0-9]',str)
    ['bx2', 'vq9']
    

    Or - as another solution - use a regexp matching urls to cut off all urls in your string and then search for r'[a-z][a-z][0-9]'