I am working on a function that works by identifying if LP
or LLP
appear, preceded or not by a space, after a "
at any position in the string. If this is the case i'd like to bring the LP
or LLP
sub string inside the quoted sub string, as shown below.
# input
'blabla "RANDOM COMPANY ONE "LLP blabla'
'blabla "RANDOM COMPANY TWO " LLP blabla'
'blabla "RANDOM COMPANY THREE " LP blabla'
'blabla "RANDOM COMPANY FOUR "LP blabla'
# output
'blabla "RANDOM COMPANY ONE LLP" blabla'
'blabla "RANDOM COMPANY TWO LLP" blabla'
'blabla "RANDOM COMPANY THREE LP" blabla'
'blabla "RANDOM COMPANY FOUR LP" blabla'
So far, I got to this function, which almost does what I want:
def fix_entity_broken_by_quotes(text):
match = r'"\s*(LL?P)'
replace = r'" \1 "'
return ' '.join(re.sub(match, replace, text).split())
# run
>>> fix_entity_broken_by_quotes('blabla "RANDOM COMPANY ONE" LLP blabla')
Out[1]: 'blabla "RANDOM COMPANY ONE" LLP " blabla'
I would not want the "
after ONE
in the resulting string.
As always, any hint or explanation on what I am missing is very welcome.
Thanks!
hint or explanation on what I am missing is very welcome.
You have leading "
in your replace
match = r'"\s*(LL?P)'
replace = r'" \1 "'
Changing replace
to r' \1 "'
should help.