Search code examples
pythonpython-unicodefile-manipulation

Python program to change unicode characters to entities


This is a program to change the unicode symbols to their respective entities from file x.input and the output should be placed in y.output. However, the program doesn't do that and only creates a copy of the file.

I have both Python27 and 35 displaying this issue and the platform is Win 7.

Where am I going wrong? Please help.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
#import io

f1 = open('x.input').read()
f2 = open('y.output','w')
for line in f1:
    x = line.replace('“', '“')
    f2.write(x)
#f1.close()
f2.close()

A screenshot of the entire program: Actual program with the double quote which is creating issues


Solution

  • The issue is a bit tricky, you have a copy/paste error from a document, where the character “ (ord 226) is not the " you expect (ord 34) (note they are similar, but slightly different). Quite probably you copied this example from a Word document.

    Just replace this character by the correct one and your program will work. The result should be (copy/paste from here so you get the correct char):

    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    #import io
    
    f1 = open('x.input').read()
    f2 = open('y.output','w')
    for line in f1:
        x = line.replace(ord(34), '“')
        f2.write(x)
    f1.close()
    f2.close()
    

    Even if not needed (the file will be closed when the program finishes), as good citizens close f1 too.

    Note: Edited for making more clear the solution, look at how is changed the replace line.