Search code examples
pythonregexreplaceregexp-replace

Python regex to replace a Particular line in paragraphs as per regex only not whole file


s="""Paragraph 1
some text blah blah
blah blah
UNWANTED TEXT
some text
Paragraph END

UNWNTED TEXT

Paragraph 2
some text blah blah
blah blah
UNWNTED TEXT
Paragraph END"""

Now python code to re.sub to replace UNWANTED TEXT only inside paragraphs keep UNWANTED TEXT Outside paragraphs

search_unwanted_only_inparagrap = re.findall('(?s)(?<=Paragraph)(.*?)(?=END)', text_file, flags = re.MULTILINE )
if search_unwanted_only_inparagrap:
    replace_only_insidepara = re.sub(r"UNWANTED TEXT+", " ", text_file)   #replace string substitue
    print (replace_only_insidepara)
else:
    print ("not found")    

But the output replace all instance of UNWANTED TEXT in through out the file

Paragraph 1
some text blah blah
blah blah

some text
Paragraph END



Paragraph 2
some text blah blah
blah blah

Paragraph END

but i expect like this

Paragraph 1
some text blah blah
blah blah

some text
Paragraph END

UNWNTED TEXT

Paragraph 2
some text blah blah
blah blah

Paragraph END

Please help.


Solution

  • Your demo input should have been more 'minimal'. However, I tried to understand your requirement and I tried re.split works:

    import re
    
    s = """Paragraph 1
    some text blah blah
    blah blah
    UNWANTED TEXT
    some text
    Paragraph END
    
    UNWANTED TEXT
    
    Paragraph 2
    some text blah blah
    blah blah
    UNWANTED TEXT
    Paragraph END"""
    reg_para = re.compile(r'(Paragraph\s+\d+.+?END)', re.DOTALL)
    paras = reg_para.split(s)
    for para in paras:
        if reg_para.match(para):
            para = re.sub(r"UNWANTED TEXT", " ", para)
            #  in case you want replace more words:
            #  of course you can use list of keywords some loops
            para = re.sub(r"Another WORD", " ", para)
            print(para)
        else:
            print(para)
    

    Output:

    Paragraph 1
    some text blah blah
    blah blah
     
    some text
    Paragraph END
    
    
    UNWANTED TEXT
    
    
    Paragraph 2
    some text blah blah
    blah blah
     
    Paragraph END