Search code examples
pythonmacosasciigetopt

Python arguments are read as code


When attempting to run a Python script, for example:

python test.py --test 'Test'

it appeared that getopt was failing. And printing sys.argv revealed:

['test.py', '\xe2\x80\x94-test', '\xe2\x80\x9cTest\xe2\x80\x9d']

I was copying and pasting the command into Terminal on OS X. The command was in a text file that may have been saved on Windows. What's a possible reason for this, as I haven't had this issue before?

If I retype the command in Terminal it works fine. Is there a way to process the arguments in the script so it interprets them correctly?


Solution

  • Your Windows editor replaced a regular dash with an em-dash, and the quotes with 'fancy' styled quoting:

    >>> '\xe2\x80\x94-test'.decode('utf8')
    u'\u2014-test'
    >>> print '\xe2\x80\x94-test'.decode('utf8')
    —-test
    >>> '\xe2\x80\x9cTest\xe2\x80\x9d'.decode('utf8')
    u'\u201cTest\u201d'
    >>> print '\xe2\x80\x9cTest\xe2\x80\x9d'.decode('utf8')
    “Test”
    >>> import unicodedata
    >>> for u in u'\u2014\u201c\u201d':
    ...     print u, unicodedata.name(u)
    ... 
    — EM DASH
    “ LEFT DOUBLE QUOTATION MARK
    ” RIGHT DOUBLE QUOTATION MARK
    

    Use a text-oriented editor next time; a word processor is liable to replace text with 'prettier' versions.

    You could do unicode.translate() calls:

    >>> import sys
    >>> sys.argv = ['test.py', '\xe2\x80\x94-test', '\xe2\x80\x9cTest\xe2\x80\x9d']
    >>> map = {0x2014: u'-', 0x201c: u"'", 0x201d: u"'"}
    >>> sys.argv[1:] = [s.decode('utf8').translate(map).encode('utf8') for s in sys.argv[1:]]
    >>> sys.argv
    ['test.py', '--test', "'Test'"]
    

    Note that the shell will not parse whitespace correctly because it has no regular quotes to work with; you may want to translate your text file using the above method first, then paste the properly quoted strings into the shell.