What would be the right pattern if I applied regex to the following string:
item = "JavaScript:muestra('01','043','071','01','600%2D2023%2DSUNARP%2DTR')"
and the result I wish to get is:
0104307101600-2023-SUNARP-TR
I've tried with:
import re
item = "JavaScript:muestra('01','043','071','01','600%2D2023%2DSUNARP%2DTR')"
content = re.findall(r"(\('.+?'\))",item)[0].replace("'","").replace(",","")
print(content)
Try:
>>> import re
>>> item = "JavaScript:muestra('01','043','071','01','600%2D2023%2DSUNARP%2DTR')"
>>> chunks = re.findall(r"'([^']+)'", item)
>>> chunks
['01', '043', '071', '01', '600%2D2023%2DSUNARP%2DTR']
>>> chunks[-1]
'600%2D2023%2DSUNARP%2DTR'
If this feels too "loose", you can first grab the parenthesized area with something like:
paren = re.search(r"\(([^)]+)\)", item)
or, more precisely:
paren = re.search(r"JavaScript:muestra\(([^)]+)\)", item)
then run the '
-based match in the top snippet.
Using re.split()
can also work:
[x for x in re.split(r"'(?:,')?", paren[1]) if x]
Another approach is ast.literal_eval
to convert the parenthesized substring into a tuple:
>>> import ast
>>> paren = re.search(r"\([^)]+\)", item)
>>> ast.literal_eval(paren[0])
('01', '043', '071', '01', '600%2D2023%2DSUNARP%2DTR')
>>> tup = ast.literal_eval(paren[0])
>>> tup[-1]
'600%2D2023%2DSUNARP%2DTR'
Boiling this down to a one-liner without regex:
>>> ast.literal_eval(item.replace("JavaScript:muestra", ""))[-1]
'600%2D2023%2DSUNARP%2DTR'