I want to have each line of a .txt
file to end with "
, but the coding of file is gb2312
or gbk
, since Chinese is include. So I create a file named heheda.txt
, whose content is as follows (the end of each line contains a return):
从前有座山"
shan里有个庙
"庙里有个"
laohe尚
Then what I tried is as follows:
for line in open('heheda.txt', 'r'):
if not line[-2] == r'"':
print line
line = line[:-1] + r'"' + line[-1:]
print line
and it returns:
shan里有个庙
shan里有个庙"
laohe尚
laohe�"�
I don't know why end for each line is line[-2]
, since I have tried line.endswith(r'"')
and line[-1] == r'"'
. And the first sentence get the right format, while second sentence with something wrong (�
).
Then I tried to read in binary mode with rb
, which makes me surprises me again:
a_file = open(data_path+'heheda.txt', 'rb')
for line in a_file:
if line[-3] != r'"':
print line
line = line[:-2] + r'"' + line[-2:]
print line
and it returns:
shan里有个庙
shan里有个庙"
laohe尚
laohe�"��
This time, I have to use line[-3] != r'"'
as the condition to judge whether sentence end with "
or not.
I cannot figure out what happens.
By the way I work in Windows7 with python 2.7.11
Does anyone know what's going on??
Windows uses "\r\n"
as newline which is automatically translated to "\n"
with text-reading mode. But your last line has no newline character.
Just strip newline characters and then test for "
:
with open('heheda.txt', 'r') as lines:
for line in lines:
line = line.rstrip()
if not line.endswith('"'):
line += '"'
print line