Search code examples
pythonregexregex-look-ahead

Combining positive and negative lookahead in python


I'm trying to extract tokens that satisfy many conditions out of which, I'm using lookahead to implement the following two conditions:

  1. The tokens must be either numeric/alphanumeric (i.e, they must have at least one digit). They can contain few special characters like - '-','/','\','.','_' etc.,

I want to match strings like: 165271, agya678, yah@123, kj*12-

  1. The tokens can't have consecutive special characters like: ajh12-&

I don't want to match strings like: ajh12-&, 671%&i^

I'm using a positive lookahead for the first condition: (?=\w*\d\w*) and a negative lookahead for the second condition: (?!=[\_\.\:\;\-\\\/\@\+]{2})

I'm not sure how to combine these two look-ahead conditions.

Any suggestions would be helpful. Thanks in advance.

Edit 1 :

I would like to extract complete tokens that are part of a larger string too (i.e., They may be present in middle of the string).

I would like to match all the tokens in the string: 165271 agya678 yah@123 kj*12-

and none of the tokens (not even a part of a token) in the string: ajh12-& 671%&i^

In order to force the regex to consider the whole string I've also used \b in the above regexs : (?=\b\w*\d\w*\b) and (?!=\b[\_\.\:\;\-\\\/\@\+]{2}\b)


Solution

  • You can use

    ^(?!=.*[_.:;\-\\\/@+*]{2})(?=[^\d\n]*\d)[\w.:;\-\\\/@+*]+$
    

    Regex demo

    The negative lookahead (?=[^\d\n]*\d) matches any char except a digit or a newline use a negated character class, and then match a digit.

    Note that you also have to add * and that most characters don't have to be escaped in the character class.

    Using contrast, you could also turn the first .* into a negated character class to prevent some backtracking

    ^(?!=[^_.:;\-\\\/@+*\n][_.:;\-\\\/@+*]{2})(?=[^\d\n]*\d)[\w.:;\-\\\/@+*]+$
    

    Edit

    Without the anchors, you can use whitespace boundaries to the left (?<!\S) and to the right (?!\S)

    (?<!\S)(?!=\S*[_.:;\-\\\/@+*]{2})(?=[^\d\s]*\d)[\w.:;\-\\\/@+*]+(?!\S)
    

    Regex demo