When attempting to run a Python script, for example:
python test.py --test 'Test'
it appeared that getopt was failing. And printing sys.argv revealed:
['test.py', '\xe2\x80\x94-test', '\xe2\x80\x9cTest\xe2\x80\x9d']
I was copying and pasting the command into Terminal on OS X. The command was in a text file that may have been saved on Windows. What's a possible reason for this, as I haven't had this issue before?
If I retype the command in Terminal it works fine. Is there a way to process the arguments in the script so it interprets them correctly?
Your Windows editor replaced a regular dash with an em-dash, and the quotes with 'fancy' styled quoting:
>>> '\xe2\x80\x94-test'.decode('utf8')
u'\u2014-test'
>>> print '\xe2\x80\x94-test'.decode('utf8')
—-test
>>> '\xe2\x80\x9cTest\xe2\x80\x9d'.decode('utf8')
u'\u201cTest\u201d'
>>> print '\xe2\x80\x9cTest\xe2\x80\x9d'.decode('utf8')
“Test”
>>> import unicodedata
>>> for u in u'\u2014\u201c\u201d':
... print u, unicodedata.name(u)
...
— EM DASH
“ LEFT DOUBLE QUOTATION MARK
” RIGHT DOUBLE QUOTATION MARK
Use a text-oriented editor next time; a word processor is liable to replace text with 'prettier' versions.
You could do unicode.translate()
calls:
>>> import sys
>>> sys.argv = ['test.py', '\xe2\x80\x94-test', '\xe2\x80\x9cTest\xe2\x80\x9d']
>>> map = {0x2014: u'-', 0x201c: u"'", 0x201d: u"'"}
>>> sys.argv[1:] = [s.decode('utf8').translate(map).encode('utf8') for s in sys.argv[1:]]
>>> sys.argv
['test.py', '--test', "'Test'"]
Note that the shell will not parse whitespace correctly because it has no regular quotes to work with; you may want to translate your text file using the above method first, then paste the properly quoted strings into the shell.