Search code examples
azure-data-explorerkqlkusto-explorer

Extract IPv6 address from string


I currently have formed a KQL that extracts ipv4 address from a string. similarly I would need to extract ipv6 address from the string

ipv4 extract query:

datatable (ipv4text:string) 
[
'This is a random text that has IP address 128.0.0.20 that has to be extracted'
]
|extend pv4addr = extract("(([0-9]{1,3})\\.([0-9]{1,3})\\.([0-9]{1,3})\\.(([0-9]{1,3})))",1,ipv4text) 

I tried the below but not sure if it covers all the edge cases

datatable (ipv6:string) 
[
'IPv6 test 198.192.0.127 2345:5:2CA1:0000:0000:567:5673:256/127 in a random string'
]
|extend Ipv6Address = extract(@"(([0-9a-fA-F]{1,4}\:){7,7}[0-9a-fA-F]{1,4})|([0-9a-fA-F]{1,4}\:){1,7}\:",1,ipv6)

Can any of you one provide a complete KQL(or suggestions/hints) to extract IPV6 address?

Thanks.


Solution

  • The regex patterns can be simplified.

    Below are the "happy paths". If it's there it will be extracted.
    Theoretically you might get false positives, although less unlikely with a real-life data.
    If needed, we can add some protection layers.

    datatable (ipv4text:string) 
    [
        'This is a random text that has IP address 128.0.0.20 that has to be extracted'
    ]
    | project pv4addr = extract(@"(\d{1,3}\.){3}\d{1,3}", 0, ipv4text)
    
    
    pv4addr
    128.0.0.20

    Fiddle

    IPV6 can become a mess (see https://en.wikipedia.org/wiki/IPv6_address#Representation).
    I would go with finding a full IPV6 representation (8 xdigit tokens, separated by colon) or any expression built of xdigit/colon/period that contains 2 adjacent colons.

    
    datatable (ipv6:string) 
    [
        'IPv6 test 198.192.0.127 2345:5:2CA1:0000:0000:567:5673:256/127 in a random string'
       ,'IPv6 test 198.192.0.127 2345:5:2CA1::567:5673:256/127 in a random string'
       ,'IPv6 test 198.192.0.127 ::ffff:198.192.0.127 in a random string'
       ,'IPv6 test 198.192.0.127 ::1 in a random string'
       ,'IPv6 test 198.192.0.127 :: in a random string'
    ]
    | project pv6addr = extract(@"([[:xdigit:]]{1,4}:){7}[[:xdigit:]]{1,4}|[[:xdigit:]:.]*::[[:xdigit:]:.]*", 0, ipv6)
    
    
    
    pv6addr
    2345:5:2CA1:0000:0000:567:5673:256
    2345:5:2CA1::567:5673:256
    ::ffff:198.192.0.127
    ::1
    ::

    Fiddle