Search code examples
pythonregextextnlptext-processing

Python Regex how to match a substring without replace a part of that


I have the following sentence:

sentence = "Work \nExperience \n\n First Experience..."

Work 
Experience 

 First Experience...

So, I want to remove the "\n" between Work and Experience, but at the same time I don't want remove "\n\n" after Experience.

Work Experience 

First Experience...

I've tried different solution like:

string = re.sub(" \n{1}[^\n]"," ",sentence)

but all of them remove the first character after \n (E).

Update: I managed to find the solution thanks to @Wiktor

print(re.sub(r'\w .*?\w+', lambda x: x.group().replace('\n', ''), sentence, flags=re.S))


Solution

  • If you want to make it a generic solution to remove any amount of \n, a newline, in between two strings, you can use

    import re
    sentence = "Work \nExperience \n\n First Experience..."
    print( re.sub(r'Work.*?Experience', lambda x: x.group().replace('\n', ''), sentence, flags=re.S) )
    

    See the Python demo. Output:

    Work Experience 
    
     First Experience...
    

    The Work.*?Experience with re.S matches any substrings between (and including) Work and Experience and then the match data object (x) is processed upon each match when all newlines are removed usign .replace('\n', '') and these modified strings are returned as replacement patterns to re.sub.