Search code examples
pythonpython-3.xregexregex-group

Convert capture group to named capture group


How would i approach converting simple capture groups to named capture groups, if i were to provide the names as a list, i normally program in python, but open to other languages that may help achieve this.

Basic Example:

Regex:

(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s(\w+)\s(\w+)\s(\d+)

Names:

["ip","name","proto","http_status_code"]

End result regex:

(?<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s(?<name>\w+)\s(?<proto>\w+)\s(?<http_status_code>\d+)

regex_data_to_test:

"172.16.1.1 bob tcp 200"

Thanks!


Solution

  • You can use the following, though it would get pretty tricky if you ever have nested parentheses:

    reg = r"(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s(\w+)\s(\w+)\s(\d+)"
    groupNames = ["ip","name", "proto", "http_status_code"]
    
    splitReg = [a for a in reg.split("(") if a] # skip empty groups
    if len(groupNames) == len(splitReg):
        newReg = ''.join([("(?P<" + name + ">" + val) 
            for name, val in zip(groupNames, splitReg)])
        print(newReg)
    

    Output:

    (?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s(?P<name>\w+)\s(?P<proto>\w+)\s(?P<http_status_code>\d+)