Search code examples
python-2.7python-unicodestringio

StringIO getvalue raising UnicodeDecodeError when printing traceback


The call:

deprint(_(u'Error finding icon for %s:') % target.s, traceback=True)

where:

def deprint(*args,**keyargs):
    # msg = u''
    try:
        msg += u' '.join([u'%s'%x for x in args])
    except UnicodeError:
        # If the args failed to convert to unicode for some reason
        # we still want the message displayed any way we can
        for x in args:
            try:
                msg += u' %s' % x
            except UnicodeError:
                msg += u' %s' % repr(x)

    if keyargs.get('traceback',False):
        o = StringIO.StringIO(msg)
        o.write(u'\n')
        traceback.print_exc(file=o)
        value = o.getvalue()
        try:
            msg += u'%s' % value
        except UnicodeError:
            msg += u'%s' % repr(value)
        o.close()
    #...

Fails with:

Traceback (most recent call last):
  File "Wrye Bash Launcher.pyw", line 87, in <module>
  File "bash\bash.pyo", line 574, in main
  File "bash\basher.pyo", line 18921, in InitLinks
  File "bash\basher.pyo", line 18291, in InitStatusBar
  File "bash\bolt.pyo", line 2470, in deprint
  File "StringIO.pyo", line 271, in getvalue
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 37: ordinal not in range(128)

for a user with a French locale.

The line 2470, in deprint corresponds to value = o.getvalue(). In StringIO:

def getvalue(self):
    """
    Retrieve the entire contents of the "file" at any time before
    the StringIO object's close() method is called.

    The StringIO object can accept either Unicode or 8-bit strings,
    but mixing the two may take some care. If both are used, 8-bit
    strings that cannot be interpreted as 7-bit ASCII (that use the
    8th bit) will cause a UnicodeError to be raised when getvalue()
    is called.
    """
    _complain_ifclosed(self.closed)
    if self.buflist:
        self.buf += ''.join(self.buflist) # line 271
        self.buflist = []
    return self.buf
  1. Where is the mixing of strings taking place ? As far as I can see I always pass unicode in.
  2. How should I rewrite the traceback.print_exc(file=o) call to be bullet proof ?

This:

     if keyargs.get('traceback',False):
-        o = StringIO.StringIO(msg)
-        o.write(u'\n')
+        o = StringIO.StringIO()
         traceback.print_exc(file=o)

does the trick but questions still stand


Solution

  • This:

    if keyargs.get('traceback',False):
        o = StringIO.StringIO()
        traceback.print_exc(file=o)
        value = o.getvalue()
        try:
            msg += u'\n%s' % unicode(value, 'utf-8')
        except UnicodeError:
            traceback.print_exc()
            msg += u'\n%s' % repr(value)
        o.close()
    

    solves it and avoids the UnicodeError msg += u'%s' % value would raise. So yes - I was indeed mixing string and unicode - string got in in the traceback.print_exc(file=o) call.

    Still not sure if the 'utf-8' in unicode(value, 'utf-8') is the way to go - but I guess that builtin exceptions would pass this ok, while my custom exceptions have unicode messages. Still say a windows error on a path may be encoded in mbcs ? Dunno - but this will do for now.

    Input (or a better solution) appreciated.