Search code examples
regexregex-lookarounds

Match IP without port 80


I want to match all IP addresses with port different than 80.

This regex almost works:

[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+(?!:80)

with this text:

94.79.78.146:44732
172.31.11.22:80
https://14.194.34.176:443/
"172.31.11.22:80"

It matches the IP addresses well, except for 172.31.11.22:80 where it matches 172.31.11.2 notice how the last 2 is omitted.

How come? I have [0-9]+ which is a greedy consumer, and it should match the entire number 22


Solution

  • The whole pattern has to match, and the reason it misses the last number, is because the [0-9]+ can backtrack one step to so that the assertion (?!:80) is true.

    You can prevent the backtracking by adding a word boundary \b after the last character class that matches numbers:

    [0-9]+\.[0-9]+\.[0-9]+\.[0-9]+\b(?!:80)
    

    See a regex demo

    If you do want to match for example port 8000 but not 80 you can add a word boundary after 80 as well.

    As you are using grep -Po according to the comments, you can also use a possessive quantifier instead of a word boundary.

    grep -Po '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]++(?!:80\b)