I hit a wall here. I need to redirect all output to a file but I need this file to be encoded in utf-8. Problem is that when using codecs.open
:
# errLog = io.open(os.path.join(os.getcwdu(),u'BashBugDump.log'), 'w',
# encoding='utf-8')
errLog = codecs.open(os.path.join(os.getcwdu(), u'BashBugDump.log'),
'w', encoding='utf-8')
sys.stdout = errLog
sys.stderr = errLog
codecs opens the file in binary mode resulting in \n
line terminators. I tried using io.open
but this does not play with the print statement used all over the codebase (see Python 2.7: print doesn't speak unicode to the io module? or python: TypeError: can't write str to text stream)
I am not the only one having this issue for instance see here but the solution they adopted is specific to the logging module we do not use.
See also this won't fix bug in python: https://bugs.python.org/issue2131
So what's the one right way for doing this in python2 ?
Redirection is a shell operation. You don't have to change the Python code at all, but you do have to tell Python what encoding to use if redirected. That is done with an environment variable. The following code redirects both stdout and stderr to a UTF-8-encoded file:
set PYTHONIOENCODING=utf8
python test.py >out.txt 2>&1
#coding:utf8
import sys
print u"我不喜欢你女朋友!"
print >>sys.stderr, u"你需要一个新的。"
我不喜欢你女朋友!
你需要一个新的。
0000: E6 88 91 E4 B8 8D E5 96 9C E6 AC A2 E4 BD A0 E5
0010: A5 B3 E6 9C 8B E5 8F 8B EF BC 81 0D 0A E4 BD A0
0020: E9 9C 80 E8 A6 81 E4 B8 80 E4 B8 AA E6 96 B0 E7
0030: 9A 84 E3 80 82 0D 0A
Note: You do need to print Unicode strings for this to work. Print byte strings and you get the bytes you print.
codecs.open
may force binary mode, but codecs.getwriter
doesn't. Give it a file opened in text mode:
#coding:utf8
import sys
import codecs
sys.stdout = sys.stderr = codecs.getwriter('utf8')(open('out.txt','w'))
print u"我不喜欢你女朋友!"
print >>sys.stderr, u"你需要一个新的。"
(same output and hexdump as above)