Search code examples
python-3.xregexregex-group

Regex capturing IP Address + /subnet format in series of strings and assign each match to a Group


I have found all the posts that I could find on this site that help to pick out the IP addresses in a given string - and the one that works absolutely best for me I modified it to also grab the /xx subnet info at the end. For example 192.168.1.1/24

What does NOT work - is that I need each of the IP matches to be put into a group... but every single example I found makes them Non-Capture groups with ?: and this is useless to me because I cant start grabbing the results to add to a spreadsheet - I'm using Python.

So:

(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?/\d{1,2})

run against this string:

blah blah  16.13.129.128/25  blah blah 18.83.130.0/24  blah blah  18.18.141.0/24  blah blah 10.17.14.0/24

does indeed match each IP :

16.13.129.128/25  
18.83.130.0/24  
18.18.141.0/24  
10.17.14.0/24  

But I cant refer to each IP as Group0, Group1, Group2 etc... for each match. I don't really understand what ?: is doing (aside from making it a non-capture group) - but when I remove all the ?: thinking they would turn into capture groups - absolutely murders the regex and it doesn't find the IP's anymore. I've used several regex debug sites to confirm these findings - but I don't know why it completely breaks when just dropping the ?:

Does anyone know how to get the same regex tweaked to allow for each IP to be assigned to a capture group such as:

Group0: 16.13.129.128/25  
Group1: 18.83.130.0/24  
Group2: 18.18.141.0/24  
Group3: 10.17.14.0/24  

Solution

  • Why? Capture groups name pieces of a single match, so that isn't what you need. Just use findall and put them in a list that you can enumerate or reference by index:

    import re
    
    ipr = re.compile(r'(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?/\d{1,2})')
    s = 'blah blah 16.13.129.128/25 blah blah 18.83.130.0/24 blah blah 18.18.141.0/24 blah blah 10.17.14.0/24'
    
    ips = ipr.findall(s)
    for i,ip in enumerate(ips):
        print(f'ips[{i}] = {ip}')
    

    Output:

    ips[0] = 16.13.129.128/25
    ips[1] = 18.83.130.0/24
    ips[2] = 18.18.141.0/24
    ips[3] = 10.17.14.0/24