I am writing a negative lookbehind assertion expression in Python which performs the following function to parse a plain text file:
Does not match anything followed after http://********** ; but will match the pattern when it is not inside a http://* link
Example:
http://www.test.com/aa4 cd6
bx2 vq9
yu9 http://www.bh9.com/cj3
Matches: cd6,bx2,vq9 and yu9
So I tried regexps like
r'(?<!http://(.*))([a-z][a-z][0-9])'
r'(?<!http://*)([a-z][a-z][0-9])'
They did not work.
How to add .* or do similar opearion inside negative look behind assertion regex in Python.
Problem: Lookbehind does not allow pattern whose length is not fixed.
Quick hack: Perhaps the following regexp does the job?
r'(?<![./])[a-z][a-z][0-9]'
It works like this:
>>> str = """http://www.test.com/aa4
... bx2 vq9
... http://www.bh9.com/cj3
... """
>>> re.findall(r'(?<![./])[a-z][a-z][0-9]',str)
['bx2', 'vq9']
Or - as another solution - use a regexp matching urls to cut off all urls in your string and then search for r'[a-z][a-z][0-9]'