Search code examples
pythonnotepad++

Text file batch word replacement using Python and Notepad++ Unicode format


The problem I am facing is for Unicode text file.Notepad++ plugin>python script. Below code perfectly works and replace the words contains the wordlist.txt. Only it works for English. Non ASCII it is unable to search. I tried With open('C:\Users\Desktop\wordlist.txt') as f: --> with io.open('C:\Users\Desktop\wordlist.txt', encoding='utf-8') as f: but notepad++ not performing for Unicode words text file. Now i need help how to pass unicode string for search. in the below code. Else please help with python code for batch whole word replace in A.text file using "word list find and replace with delimiter in B.Text file".

With open('C:\Users\Desktop\wordlist.txt') as f:
    for l in f:
        s = l.split()
        editor.rereplace(r'\b' + s[0] + r'\b', s[1])

Solution

  • Do not use word boundary \b that cause problem with utf8 characters. Use instead lookaround:

    import re
    
    with open('D:\\temp\\wordlist.txt') as f:
        for l in f:
            s = l.split()
            editor.rereplace(r'(?<!\S)' + s[0] + r'(?!\S)', '\t' + s[1])
    

    Where:

    • (?<!\S) is a negative lookbehind that make sure with haven't a NON space before the word to be modified
    • (?!\S) is a negative lookahead that make sure with haven't a NON space after the word to be modified

    With your 2 sample files, I got:

        मारुती
    नामशिवाया 
        जयश्रीराम 
    जयश्रीराम 
    
    • Note: I've added tabulation before the modified words for redability, remove it for your application.

    Screenshot:

    enter image description here