Search code examples
pythonregexpython-3.xparenthesessquare-bracket

regular expression, multiple parentheses and square brackets


Answer from @avinash-raj:

re.findall(r'\([^\[\]()]*\[\([^\[\]()]+source=([\w./]+)', s)

Modified question:

I have the following string:

s=string='s=<a=1 b=[(text1 [(text2 source=x.gz i=i.gz)]), ([(text3 j=1.0 source=y.gz)])] c=[([(3)])] d=[([(source=x.gz)])]>'

I want to get this list as an output:

['x.gz','y.gz']

Original question:

I have the following string:

s=string='s=<a=1 b=[([(source=x.gz i=0)]), ([(j=1 source=y.gz)])] c=[([(3)])]>'

I want to get this list as an output:

['x.gz','y.gz']

I have tried this:

re.findall(r'b=\[([^]]*)\]',s)

Which returns:

['([(source=x.gz i=0)']

I have also tried this:

re.findall(r'\[([^]]*)\]',s)

Which returns:

['([(source=x.gz i=0)', '(j=1 source=y.gz)', '([(3)']

I am equally happy for a one line answer or being pointed to a tutorial, which enables me to find the answer myself upon completion of it. Thanks.

EDIT1: Changed string (see answers below):

s=string='s=<a=1 b=[([(source=x.gz i=0)]), ([(j=1 source=y.gz)])] c=[([(3)])] source=4>'

EDIT2: Changed string (no answers provided, but I'll provide it myself):

s=string='s=<a=1 b=[(text1 [(text2 source=x.gz i=i.gz)]), ([(text3 j=1.0 source=y.gz)])] c=[([(3)])] d=[([(source=x.gz)])]>'

I tried this:

re.findall(r'(?<=b=)\[\(.*?[\S]*?source=([\w\./]+)', s)

But it only returns:

['x.gz']

Solution

  • Use capturing groups to capture the characters that you want to print.

    >>> string = 's=<a=1 b=[([(source=x.gz i=0)]), ([(j=1 source=y.gz)])] c=[([(3)])] source=4>'
    >>> re.findall(r'\(\[\([^\[\]()]*source=([\w.]+)', string)
    ['x.gz', 'y.gz']
    

    Explanation:

    • \(\[\( Matches ([( characters literally.
    • [^\[\]()]* A negated character class which matches any character but not of [ or ] or ( or ) zero or more times.
    • source= Matches the string source=
    • ([\w.]+) Captures word character or a dot one or more times.

    Update:

    >>> string = 's=<a=1 b=[(text1 [(text2 source=x.gz i=i.gz)]), ([(text3 j=1.0 source=y.gz)])] c=[([(3)])] d=[([(source=x.gz)])]>'
    >>> re.findall(r'\([^\[\]()]*\[\([^\[\]()]+source=([\w.]+)', string)
    ['x.gz', 'y.gz']