Search code examples
rubyregexipv4

Match all IP addresses in text with regex


Let's say I have string which might contain one or more IP addresses. How to match all of them and only valid ones using regex in ruby?

Currently, my solution looks like this:

IP_ADDR_REGEX = %r{
  \b
  (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
  (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
  (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
  (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
  \b
}x

This works well when IPs are separated with spaces e.g it extracts 192.168.1.1 from the text bla bla 192.168.1.1 bla. However it still extracts 192.168.1.1 in this case: bla bla 192.168.1.1.1.1 bla.

How to make it not match such cases? i.e when it is 192.168.1.1.1.1 my regex should not return a match. I've looked for many solutions about this problem but could not find what I want. I also tried to figure out the solution myself by matching only whitespaces in the end (because \b matches a . character as well) but I could not make it work. Thanks


Solution

  • You may solve it by adding lookarounds that will fail the match if the IP-like string is preceded with a digit and a dot or is followed with a dot and a digit:

    IP_ADDR_REGEX = %r{
      \b                                        # Word boundary
      (?<!\d\.)                                 # Negative lookbehind: no "X." before
      (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
      (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
      (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
      (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
      \b                                        # Word boundary
      (?!\.\d)                                  # Negative lookahead: no ".X" after
    }x
    

    Regex demo #1

    Note that in case you only want to limit matches to whitespace-separated substrings use

    IP_ADDR_REGEX = %r{
      (?<!\S)                                    # Position not preceded with non-whitespace char
      (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
      (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
      (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
      (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
      (?!\S)                                    # Position not followed with non-whitespace char
    }x
    

    Regex demo #2

    Note that (?:...) non-capturing group will enable easier usage with String#scan method to collect all matches from strings.