Runtime Environment: Python 2.7, Windows 7
NOTE:I am talking about the encoding of the file generated by the PYTHON source code(NOT talking about the PYTHON source file's encoding), the encoding declared at the top of the PYTHON source file DID agree with the encoding in which the PYTHON source file was saved.
When there are no non-ascii characters in the string(content = 'abc'
), the file(file.txt
, NOT the PYTHON source file) is saved in ANSI encoding after fp.close()
, the PYTHON file's(and it is saved in ANSI encoding format) content is as below:
## Author: melo
## Email:prevision@imsrch.tk
## Date: 2012/10/12
import os
def write_file(filepath, mode, content):
try:
fp = open(filepath, mode)
try:
print 'file encoding:', fp.encoding
print 'file mode:', fp.mode
print 'file closed?', fp.closed
fp.write(content)
finally:
fp.close()
print 'file closed?', fp.closed
except IOError, e:
print e
if __name__ == '__main__':
filepath = os.path.join(os.getcwd(), 'file.txt')
content = 'abc'
write_file(filepath, 'wb', content)
but when there are some non-ascii characters in the string(content = 'abc莹'
), the file(file.txt
) will be saved in UTF-8 encoding after fp.close()
, although I declared the encoding at the top of the PYTHON source file(not file.txt
) with #encoding=gbk
. At this time, the PYTHON source file's content is as below:
# -*- encoding: gbk -*-
## Author: melo
## Email:prevision@imsrch.tk
## Date: 2012/10/12
import os
def write_file(filepath, mode, content):
try:
fp = open(filepath, mode)
try:
print 'file encoding:', fp.encoding
print 'file mode:', fp.mode
print 'file closed?', fp.closed
fp.write(content)
finally:
fp.close()
print 'file closed?', fp.closed
except IOError, e:
print e
if __name__ == '__main__':
filepath = os.path.join(os.getcwd(), 'file.txt')
content = 'abc莹'
write_file(filepath, 'wb', content)
Is there any proof that it behaves like this?
A file is saved in the encoding you save it in. A source file is saved in the encoding you save it in. They don't have to be the same, they just should be declared.
Per your other question, I assume you are using Notepad++ and when you open file.txt
you find that Notepad++ thinks the file is encoded in UTF-8 without BOM
. This is an incorrect guess by Notepad++. Select the Chinese GB2312 character set and the file will display properly.
Unless given a hint by a byte order mark (BOM) or some other metadata or told by the user, programs have no idea what encoding a file is in.
A correct Python program would do these things:
Example:
# encoding: utf-8
import codecs
with codecs.open('file.txt','wb',encoding='utf-8-sig') as f:
f.write(u'abc莹')
You should now see in Notepad++ that file.txt
is detected as encoded as 'UTF-8' (with BOM) and display the file properly.
Note that you can save the file in 'ANSI' (GBK on your system) if you declare the encoding as gbk
and it will still work because Unicode strings were used.
Actually, your system probably is code page 936 (cp936
) instead of GBK. They aren't precisely the same. Better to use a Unicode encoding like UTF-8 or UTF-16 which can represent all Unicode characters accurately.