I was trying to find out the best way to find the specific substring in key value pair using re for the following:
some_string-variable_length/some_no_variable_digit/some_no1_variable_digit/some_string1/some_string2
eg: aba/101/11111/cde/xyz or aaa/111/1119/cde/xzx or ada/21111/5/cxe/yyz
here everything is variable and what I was looking for is something like below in key value pair:
`cde: 2` as there are two entries for cde
cxe: 1 as there is only one cxe
Note: everything is variable here except /. ie cde or cxe or some string will be there exactly after two / in each case
input:aba/101/11111/cde/xyz/blabla
output: cde:xyz/blabla
input: aaa/111/1119/cde/xzx/blabla
output: cde:xzx/blabla
input: aahjdsga/11231/1119/gfts/sjhgdshg/blabla
output: gfts:sjhgdshg/blabla
If you notice here, my key is always the first string after 3rd / and value is always the substring after key
Here are a couple of solutions based on your description that "key is always the first string after 3rd / and value is always the substring after key". The first uses str.split
with a maxsplit
of 4 to collect everything after the fourth /
into the value. The second uses regex to extract the two parts:
inp = ['aba/101/11111/cde/xyz/blabla',
'aaa/111/1119/cde/xzx/blabla',
'aahjdsga/11231/1119/gfts/sjhgdshg/blabla'
]
for s in inp:
parts = s.split('/', 4)
key = parts[3]
value = parts[4]
print(f'{key}:{value}')
import re
for s in inp:
m = re.match(r'^(?:[^/]*/){3}([^/]*)/(.*)$', s)
if m is not None:
key = m.group(1)
value = m.group(2)
print(f'{key}:{value}')
For both pieces of code the output is
cde:xyz/blabla
cde:xzx/blabla
gfts:sjhgdshg/blabla