Search code examples
pythonpython-3.xencodingutf-8gb2312

why can't I save my file as utf-8 format


I want to save a string to a new txt file.

The encoding of the string is 'utf-8'(I think so) and it contains some Chinese character

But the file's is GB2312

here is my code,I omit some:

# -*- coding:utf-8 -*-
# Python 3.4 window 7

def getUrl(self, url, coding='utf-8'):
    self.__reCompile = {}
    req = request.Request(url)
    req.add_header('User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 UBrowser/5.5.9703.2 Safari/537.36')
    with request.urlopen(req) as response:
        return response.read().decode(coding)

def saveText(self,filename,content,mode='w'):
    self._checkPath(filename)
    with open(filename,mode) as f:
        f.write(content)

joke= self.getUrl(pageUrl)
#some re transform such as re.sub('<br>','\r\n',joke)
self.saveText(filepath+'.txt',joke,'a')

Sometimes there is an UnicodeEncodeError: enter image description here


Solution

  • Your exception is thrown in 'saveText', but I can't see how you implemented it so I'll try to reproduce the error and the give a suggestion to a fix.

    In 'getUrl' you return a decoded string ( .decode('utf-8') ) and my guess is, that in 'saveText', you forget to encode it before writing to the file.

    Reproducing the error

    Trying to reproduce the error, I did this:

    # String with unicode chars, decoded like in you example
    s = 'æøå'.decode('utf-8') 
    
    # How saveText could be:
    # Encode before write
    f = open('test', mode='w')
    f.write(s)
    f.close()
    

    this gives a similar exception:

    ---------------------------------------------------------------------------
    UnicodeEncodeError                        Traceback (most recent call last)
    <ipython-input-36-1309da3ad975> in <module>()
          5 # Encode before write
          6 f = open('test', mode='w')
    ----> 7 f.write(s)
          8 f.close()
    
    UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
    

    Two ways of fixing

    You can do either:

    # String with unicode chars, decoded like in you example
    s = 'æøå'.decode('utf-8') 
    
    # How saveText could be:
    # Encode before write
    f = open('test', mode='w')
    f.write(s.encode('utf-8'))
    f.close()
    

    or you can try writing the file using the module 'codecs':

    import codecs
    
    # String with unicode chars, decoded like in you example
    s = 'æøå'.decode('utf-8') 
    
    # How saveText could be:
    f = codecs.open('test', encoding='utf-8', mode='w')
    f.write(s)  
    f.close()
    

    Hope this helps.