Let's say I have string which might contain one or more IP addresses. How to match all of them and only valid ones using regex in ruby?
Currently, my solution looks like this:
IP_ADDR_REGEX = %r{
\b
(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
\b
}x
This works well when IPs are separated with spaces e.g it extracts 192.168.1.1
from the text bla bla 192.168.1.1 bla
. However it still extracts 192.168.1.1
in this case: bla bla 192.168.1.1.1.1 bla
.
How to make it not match such cases? i.e when it is 192.168.1.1.1.1
my regex should not return a match. I've looked for many solutions about this problem but could not find what I want. I also tried to figure out the solution myself by matching only whitespaces in the end (because \b
matches a .
character as well) but I could not make it work.
Thanks
You may solve it by adding lookarounds that will fail the match if the IP-like string is preceded with a digit and a dot or is followed with a dot and a digit:
IP_ADDR_REGEX = %r{
\b # Word boundary
(?<!\d\.) # Negative lookbehind: no "X." before
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
\b # Word boundary
(?!\.\d) # Negative lookahead: no ".X" after
}x
Note that in case you only want to limit matches to whitespace-separated substrings use
IP_ADDR_REGEX = %r{
(?<!\S) # Position not preceded with non-whitespace char
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
(?!\S) # Position not followed with non-whitespace char
}x
Note that (?:...)
non-capturing group will enable easier usage with String#scan
method to collect all matches from strings.