Search code examples
javaregexipv6regex-negation

IPV6 address into compressed form in Java


I have used Inet6Address.getByName("2001:db8:0:0:0:0:2:1").toString() method to compress IPv6 address, and the output is 2001:db8:0:0:0:0:2:1 ,but i need 2001:db8::2:1 . , Basically the compression output should based on RFC 5952 standard , that is

  1. Shorten as Much as Possible : For example, 2001:db8:0:0:0:0:2:1 must be shortened to
    2001:db8::2:1.Likewise, 2001:db8::0:1 is not acceptable, because the symbol "::" could have been used to produce a shorter representation 2001:db8::1.

  2. Handling One 16-Bit 0 Field : The symbol "::" MUST NOT be used to shorten just one 16-bit 0 field. For example, the representation 2001:db8:0:1:1:1:1:1 is correct, but 2001:db8::1:1:1:1:1 is not correct.

  3. Choice in Placement of "::" : = When there is an alternative choice in the placement of a "::", the longest run of consecutive 16-bit 0 fields MUST be shortened (i.e., the sequence with three consecutive zero fields is shortened in 2001: 0:0:1:0:0:0:1). When the length of the consecutive 16-bit 0 fields are equal (i.e., 2001:db8:0:0:1:0:0:1), the first sequence of zero bits MUST be shortened. For example, 2001:db8::1:0:0:1 is correct representation.

I have also checked another post in Stack overflow, but there was no condition specified (example choice in placement of ::).

Is there any java library to handle this? Could anyone please help me?

Thanks in advance.


Solution

  • How about this?

    String resultString = subjectString.replaceAll("((?::0\\b){2,}):?(?!\\S*\\b\\1:0\\b)(\\S*)", "::$2").replaceFirst("^0::","::");
    

    Explanation without Java double-backslash hell:

    (       # Match and capture in backreference 1:
     (?:    #  Match this group:
      :0    #  :0
      \b    #  word boundary
     ){2,}  # twice or more
    )       # End of capturing group 1
    :?      # Match a : if present (not at the end of the address)
    (?!     # Now assert that we can't match the following here:
     \S*    #  Any non-space character sequence
     \b     #  word boundary
     \1     #  the previous match
     :0     #  followed by another :0
     \b     #  word boundary
    )       # End of lookahead. This ensures that there is not a longer
            # sequence of ":0"s in this address.
    (\S*)   # Capture the rest of the address in backreference 2.
            # This is necessary to jump over any sequences of ":0"s
            # that are of the same length as the first one.
    

    Input:

    2001:db8:0:0:0:0:2:1
    2001:db8:0:1:1:1:1:1
    2001:0:0:1:0:0:0:1
    2001:db8:0:0:1:0:0:1
    2001:db8:0:0:1:0:0:0
    

    Output:

    2001:db8::2:1
    2001:db8:0:1:1:1:1:1
    2001:0:0:1::1
    2001:db8::1:0:0:1
    2001:db8:0:0:1::
    

    (I hope the last example is correct - or is there another rule if the address ends in 0?)