Search code examples
pythonregexpython-3.7

best way to find substring using regex in python 3


I was trying to find out the best way to find the specific substring in key value pair using re for the following:

some_string-variable_length/some_no_variable_digit/some_no1_variable_digit/some_string1/some_string2
eg: aba/101/11111/cde/xyz or aaa/111/1119/cde/xzx or ada/21111/5/cxe/yyz

here everything is variable and what I was looking for is something like below in key value pair:

`cde: 2` as there are two entries for cde

cxe: 1 as there is only one cxe

Note: everything is variable here except /. ie cde or cxe or some string will be there exactly after two / in each case

input:aba/101/11111/cde/xyz/blabla
output: cde:xyz/blabla
input: aaa/111/1119/cde/xzx/blabla
output: cde:xzx/blabla
input: aahjdsga/11231/1119/gfts/sjhgdshg/blabla
output: gfts:sjhgdshg/blabla

If you notice here, my key is always the first string after 3rd / and value is always the substring after key


Solution

  • Here are a couple of solutions based on your description that "key is always the first string after 3rd / and value is always the substring after key". The first uses str.split with a maxsplit of 4 to collect everything after the fourth / into the value. The second uses regex to extract the two parts:

    inp = ['aba/101/11111/cde/xyz/blabla',
            'aaa/111/1119/cde/xzx/blabla',
            'aahjdsga/11231/1119/gfts/sjhgdshg/blabla'
            ]
    
    for s in inp:
        parts = s.split('/', 4)
        key = parts[3]
        value = parts[4]
        print(f'{key}:{value}')
    
    import re
    
    for s in inp:
        m = re.match(r'^(?:[^/]*/){3}([^/]*)/(.*)$', s)
        if m is not None:
            key = m.group(1)
            value = m.group(2)
            print(f'{key}:{value}')
    

    For both pieces of code the output is

    cde:xyz/blabla
    cde:xzx/blabla
    gfts:sjhgdshg/blabla