Search code examples
pythonregexpython-re

Failed to create the right pattern to extract the desired portion from a string using regex


What would be the right pattern if I applied regex to the following string:

item = "JavaScript:muestra('01','043','071','01','600%2D2023%2DSUNARP%2DTR')"

and the result I wish to get is:

0104307101600-2023-SUNARP-TR

I've tried with:

import re

item = "JavaScript:muestra('01','043','071','01','600%2D2023%2DSUNARP%2DTR')"
content = re.findall(r"(\('.+?'\))",item)[0].replace("'","").replace(",","")
print(content)

Solution

  • Try:

    >>> import re
    >>> item = "JavaScript:muestra('01','043','071','01','600%2D2023%2DSUNARP%2DTR')"
    >>> chunks = re.findall(r"'([^']+)'", item)
    >>> chunks
    ['01', '043', '071', '01', '600%2D2023%2DSUNARP%2DTR']
    >>> chunks[-1]
    '600%2D2023%2DSUNARP%2DTR'
    

    If this feels too "loose", you can first grab the parenthesized area with something like:

    paren = re.search(r"\(([^)]+)\)", item)
    

    or, more precisely:

    paren = re.search(r"JavaScript:muestra\(([^)]+)\)", item)
    

    then run the '-based match in the top snippet.

    Using re.split() can also work:

    [x for x in re.split(r"'(?:,')?", paren[1]) if x]
    

    Another approach is ast.literal_eval to convert the parenthesized substring into a tuple:

    >>> import ast
    >>> paren = re.search(r"\([^)]+\)", item)
    >>> ast.literal_eval(paren[0])
    ('01', '043', '071', '01', '600%2D2023%2DSUNARP%2DTR')
    >>> tup = ast.literal_eval(paren[0])
    >>> tup[-1]
    '600%2D2023%2DSUNARP%2DTR'
    

    Boiling this down to a one-liner without regex:

    >>> ast.literal_eval(item.replace("JavaScript:muestra", ""))[-1]
    '600%2D2023%2DSUNARP%2DTR'