My goal is to identify abbreviation word that appears right after @PROG$ and change it to @PROG$. (eg. ALI -> @PROG$)
Input
s = "Background (UNASSIGNED): Previous study of ours showed that @PROG$ (ALI) and C-reactive protein (CRP) are independent significant prognostic factors in operable non-small cell lung cancer (NSCLC) patients."
Output
"Background (UNASSIGNED): Previous study of ours showed that @PROG$ @PROG$ and C-reactive protein (CRP) are independent significant prognostic factors in operable non-small cell lung cancer (NSCLC) patients."
I tried something like this re.findall('(\(.*?\))', s)
which gave me all the abbreviations. Any help from here? what I need to fix?
You can use a re.sub
solution like
import re
s = "Background (UNASSIGNED): Previous study of ours showed that @PROG$ (ALI) and C-reactive protein (CRP) are independent significant prognostic factors in operable non-small cell lung cancer (NSCLC) patients."
print( re.sub(r'(@PROG\$\s+)\([A-Z]+\)', r'\1@PROG$', s) )
# => Background (UNASSIGNED): Previous study of ours showed that @PROG$ @PROG$ and C-reactive protein (CRP) are independent significant prognostic factors in operable non-small cell lung cancer (NSCLC) patients.
See the Python demo. The regex is
(@PROG\$\s+)\([A-Z]+\)
See the regex demo. Details:
(@PROG\$\s+)
- Group 1 (\1
refers to this group value from the replacement pattern): @PROG$
and one or more whitespaces\(
- a (
char[A-Z]+
- one or more uppercase ASCII letters (replace with [^()]*
to match anything in between parentheses except for (
and )
)\)
- a )
char.