Search code examples
pythonpython-2.7non-ascii-characters

python -m <filename> <encoding>?


How to specify encoding when running python script as a module?

For example, I want to run my_script.py as python -m my_script -utf8. But there is no such an option. Instead, I should provide my_script.py with encoding on top of the file. And it fails with some python-2.7 packages.

Consider next scenario:

my_script.py:

# coding=utf-8
from pyglet.gl import *
  1. $ cd ~/Documents
  2. create non-ascii folder: $ mkdir вафля
  3. $ cd вафля
  4. create my_script.py with the code above
  5. python my_script.py -- works well
  6. python -m my_script -- fails

Work station: Ubuntu 14.04.3 x64 + Python 2.7.6 x64 (built-in)

Do not suggest me to switch on Python 3.4 because I've already done it and just want to support both 2.7 and 3.4 versions of Python.

Added traceback.

File "my_script.py", line 22, in <module>
    from pyglet.gl import *
  File "/usr/local/lib/python2.7/dist-packages/pyglet/gl/__init__.py", line 236, in <module>
    import pyglet.window
  File "/usr/local/lib/python2.7/dist-packages/pyglet/window/__init__.py", line 1817, in <module>
    gl._create_shadow_window()
  File "/usr/local/lib/python2.7/dist-packages/pyglet/gl/__init__.py", line 205, in _create_shadow_window
    _shadow_window = Window(width=1, height=1, visible=False)
  File "/usr/local/lib/python2.7/dist-packages/pyglet/window/xlib/__init__.py", line 163, in __init__
    super(XlibWindow, self).__init__(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/pyglet/window/__init__.py", line 559, in __init__
    self._create()
  File "/usr/local/lib/python2.7/dist-packages/pyglet/window/xlib/__init__.py", line 353, in _create
    self.set_caption(self._caption)
  File "/usr/local/lib/python2.7/dist-packages/pyglet/window/xlib/__init__.py", line 511, in set_caption
    self._set_text_property('WM_NAME', caption, allow_utf8=False)
  File "/usr/local/lib/python2.7/dist-packages/pyglet/window/xlib/__init__.py", line 785, in _set_text_property
    buf = create_string_buffer(value.encode('ascii', 'ignore'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 19: ordinal not in range(128)

Solution

  • This appears to be a bug in pyglet. It using sys.argv[0] as its default window caption, but it expects the caption string to be a unicode instance, which it can later encode to ASCII (ignoring non-representable unicode values). However, in Python 2, sys.argv[0] will be a bytestring (a str instance) in some encoding (I'm not sure if the encoding specified anywhere or if it might vary from filesystem to filesystem). When you try to encode an already encoded bytestring, Python 2 first tries to decode the string to a unicode object using the ascii codec, before encoding as requested.

    You're seeing this bug bite you only when you use the -m flag because only in that situation (of the ways you tested) is the non-ASCII part of the path included in sys.argv[0]. When you call python my_script.py, sys.argv[0] is "my_script.py". When you use -m, sys.argv[0] will be the absolute path to the script file (including the non-ASCII folder).

    I'm not sure exactly what a proper fix would be, since, as I mentioned above, I'm not sure the encoding used by sys.argv is well specified in Python 2. If you want to fix the issue just for your system, you can probably just change these lines in pyglet/window/__init__.py (they should be roughly lines 555-556):

            if caption is None:
                caption = sys.argv[0]
    

    To:

            if caption is None:
                caption = sys.argv[0].decode("utf8")