In what encoding are the elements of sys.argv
, in Python? are they encoded with the sys.getdefaultencoding()
encoding?
sys.getdefaultencoding(): Return the name of the current default string encoding used by the Unicode implementation.
PS: As pointed out in some of the answers, sys.stdin.encoding
would indeed be a better guess. I would love to see a definitive answer to this question, though, with pointers to solid sources!
PPS: As Wim pointed out, Python 3 solves this issue by putting str
objects in sys.argv (if I understand correctly). The question remains open for Python 2.x, though. Under Unix, the LC_CTYPE environment variable seems to be the correct thing to check, no? What should be done with Windows (so that sys.argv elements are correctly interpreted whatever the console)?
"What should be done with Windows (so that sys.argv elements are correctly interpreted whatever the console)?"
For Python 2.x, see this comment on issue2128.
(Note that no encoding is correct for the original sys.argv, because some characters may have been mangled in ways that there is not enough information to undo; for example, if the ANSI codepage cannot represent Greek alpha then it will be mangled to 'a'.)