Answer from @avinash-raj:
re.findall(r'\([^\[\]()]*\[\([^\[\]()]+source=([\w./]+)', s)
Modified question:
I have the following string:
s=string='s=<a=1 b=[(text1 [(text2 source=x.gz i=i.gz)]), ([(text3 j=1.0 source=y.gz)])] c=[([(3)])] d=[([(source=x.gz)])]>'
I want to get this list as an output:
['x.gz','y.gz']
Original question:
I have the following string:
s=string='s=<a=1 b=[([(source=x.gz i=0)]), ([(j=1 source=y.gz)])] c=[([(3)])]>'
I want to get this list as an output:
['x.gz','y.gz']
I have tried this:
re.findall(r'b=\[([^]]*)\]',s)
Which returns:
['([(source=x.gz i=0)']
I have also tried this:
re.findall(r'\[([^]]*)\]',s)
Which returns:
['([(source=x.gz i=0)', '(j=1 source=y.gz)', '([(3)']
I am equally happy for a one line answer or being pointed to a tutorial, which enables me to find the answer myself upon completion of it. Thanks.
EDIT1: Changed string (see answers below):
s=string='s=<a=1 b=[([(source=x.gz i=0)]), ([(j=1 source=y.gz)])] c=[([(3)])] source=4>'
EDIT2: Changed string (no answers provided, but I'll provide it myself):
s=string='s=<a=1 b=[(text1 [(text2 source=x.gz i=i.gz)]), ([(text3 j=1.0 source=y.gz)])] c=[([(3)])] d=[([(source=x.gz)])]>'
I tried this:
re.findall(r'(?<=b=)\[\(.*?[\S]*?source=([\w\./]+)', s)
But it only returns:
['x.gz']
Use capturing groups to capture the characters that you want to print.
>>> string = 's=<a=1 b=[([(source=x.gz i=0)]), ([(j=1 source=y.gz)])] c=[([(3)])] source=4>'
>>> re.findall(r'\(\[\([^\[\]()]*source=([\w.]+)', string)
['x.gz', 'y.gz']
Explanation:
\(\[\(
Matches ([(
characters literally.[^\[\]()]*
A negated character class which matches any character but not of [
or ]
or (
or )
zero or more times.source=
Matches the string source=
([\w.]+)
Captures word character or a dot one or more times.Update:
>>> string = 's=<a=1 b=[(text1 [(text2 source=x.gz i=i.gz)]), ([(text3 j=1.0 source=y.gz)])] c=[([(3)])] d=[([(source=x.gz)])]>'
>>> re.findall(r'\([^\[\]()]*\[\([^\[\]()]+source=([\w.]+)', string)
['x.gz', 'y.gz']