Search code examples
pythonpython-2.7locale

where Py_FileSystemDefaultEncoding is set in python source code


i am curious about how python source code set the value of Py_FileSystemDefaultEncoding. And i have receive a strange thing.

Since python doc about sys.getfilesystemencoding() said that:

On Unix, the encoding is the user’s preference according to the result of nl_langinfo(CODESET), or None if the nl_langinfo(CODESET) failed.

i use python 2.7.6

```

>>>import sys
>>>sys.getfilesystemencoding()
>>>'UTF-8'
>>>import locale
>>>locale.nl_langinfo(locale.CODESET)
>>>'ANSI_X3.4-1968'

```
Here is the question: why the value of getfilesystemencoding() is different from the value of locale.nl_landinfo() since the doc says that getfilesystemencoding() is derived from locale.nl_landinfo().

Here is the locale command output in my terminal:

LANG=en_US.UTF-8
LANGUAGE=en_US:en
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=zh_CN.UTF-8
LC_TIME=zh_CN.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=zh_CN.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=zh_CN.UTF-8
LC_NAME=zh_CN.UTF-8
LC_ADDRESS=zh_CN.UTF-8
LC_TELEPHONE=zh_CN.UTF-8
LC_MEASUREMENT=zh_CN.UTF-8
LC_IDENTIFICATION=zh_CN.UTF-8
LC_ALL=

Solution

  • Summary: sys.getfilesystemencoding() behaves as documented. The confusion is due to the difference between setlocale(LC_CTYPE, "") (user's preference) and the default C locale.


    The script always starts with the default C locale:

    >>> import locale
    >>> locale.nl_langinfo(locale.CODESET)
    'ANSI_X3.4-1968'
    

    But getfilesystemencoding() uses user's locale:

    >>> import sys
    >>> sys.getfilesystemencoding()
    'UTF-8'
    >>> locale.setlocale(locale.LC_CTYPE, '')
    'en_US.UTF-8'
    >>> locale.nl_langinfo(locale.CODESET)
    'UTF-8'
    

    Empty string as a locale name selects a locale based on the user choice of the appropriate environment variables.

    $ LC_CTYPE=C python -c 'import sys; print(sys.getfilesystemencoding())'
    ANSI_X3.4-1968
    $ LC_CTYPE=C.UTF-8 python -c 'import sys; print(sys.getfilesystemencoding())'
    UTF-8
    

    where can i find the source code about setting Py_FileSystemDefaultEncoding.

    There are two places in the source code for Python 2.7:


    Can you give me some advice how to search some keywords in python source code

    To find these places:

    • clone Python 2.7 source code:

      $ hg clone https://hg.python.org/cpython && cd cpython
      $ hg update 2.7
      
    • search for Py_FileSystemDefaultEncoding *= regex in your editor e.g.:

      $ make TAGS # to create tags table
      

      in Emacs: M-x tags-search RET Py_FileSystemDefaultEncoding *= RET and M-, to continue the search.