Search code examples
pythonregexdouble-quotes

regex to bring a word inside a quoted substring


I am working on a function that works by identifying if LP or LLP appear, preceded or not by a space, after a " at any position in the string. If this is the case i'd like to bring the LP or LLP sub string inside the quoted sub string, as shown below.

# input
'blabla "RANDOM COMPANY ONE "LLP blabla'
'blabla "RANDOM COMPANY TWO " LLP blabla'
'blabla "RANDOM COMPANY THREE " LP blabla'
'blabla "RANDOM COMPANY FOUR "LP blabla'

# output
'blabla "RANDOM COMPANY ONE LLP" blabla'
'blabla "RANDOM COMPANY TWO LLP" blabla'
'blabla "RANDOM COMPANY THREE LP" blabla'
'blabla "RANDOM COMPANY FOUR LP" blabla'

So far, I got to this function, which almost does what I want:

def fix_entity_broken_by_quotes(text):

    match = r'"\s*(LL?P)'
    replace = r'" \1 "'

    return ' '.join(re.sub(match, replace, text).split())

# run

>>> fix_entity_broken_by_quotes('blabla "RANDOM COMPANY ONE" LLP blabla')
Out[1]: 'blabla "RANDOM COMPANY ONE" LLP " blabla'

I would not want the " after ONE in the resulting string.

As always, any hint or explanation on what I am missing is very welcome.

Thanks!


Solution

  • hint or explanation on what I am missing is very welcome. You have leading " in your replace

    match = r'"\s*(LL?P)'
    replace = r'" \1 "'
    

    Changing replace to r' \1 "' should help.