Search code examples
pythonwindowspython-2.7encodingcarriage-return

How can I add r'"' to each line of a file if it doesn't end with r'"'


I want to have each line of a .txt file to end with ", but the coding of file is gb2312 or gbk, since Chinese is include. So I create a file named heheda.txt, whose content is as follows (the end of each line contains a return):

从前有座山"
shan里有个庙
"庙里有个"
laohe尚

Then what I tried is as follows:

for line in open('heheda.txt', 'r'):
    if not line[-2] == r'"':
        print line
        line = line[:-1] + r'"' + line[-1:]
        print line

and it returns:

shan里有个庙

shan里有个庙"

laohe尚
laohe�"�

I don't know why end for each line is line[-2], since I have tried line.endswith(r'"') and line[-1] == r'"'. And the first sentence get the right format, while second sentence with something wrong ().

Then I tried to read in binary mode with rb, which makes me surprises me again:

a_file = open(data_path+'heheda.txt', 'rb')
for line in a_file:
    if line[-3] != r'"':
        print line
        line = line[:-2] + r'"' + line[-2:]
        print line

and it returns:

shan里有个庙

shan里有个庙"

laohe尚
laohe�"��

This time, I have to use line[-3] != r'"' as the condition to judge whether sentence end with " or not. I cannot figure out what happens. By the way I work in Windows7 with python 2.7.11

Does anyone know what's going on??


Solution

  • Windows uses "\r\n" as newline which is automatically translated to "\n" with text-reading mode. But your last line has no newline character.

    Just strip newline characters and then test for ":

    with open('heheda.txt', 'r') as lines:
        for line in lines:
            line = line.rstrip()
            if not line.endswith('"'):
                line += '"'
            print line