Search code examples
pythonubuntuutf-8locale

Setting UTF-8 locale for python in Ubuntu 12.04


On an Ubuntu 12.04 VM (set up using vagrant and the hashicorp/precise64 box), my locale says that I have the UTF-8 language, but python is getting a latin-1 environment.

Here's what I'm seeing:

vagrant@vagrant:~$ locale                                                                                                                                 
LANG=en_US.UTF-8                                                                                                                                          
LANGUAGE=                                                                                                                                                 
LC_CTYPE="en_US"                                                                                                                                          
LC_NUMERIC="en_US"                                                                                                                                        
LC_TIME="en_US"                                                                                                                                           
LC_COLLATE="en_US"                                                                                                                                        
LC_MONETARY="en_US"                                                                                                                                       
LC_MESSAGES="en_US"                                                                                                                                       
LC_PAPER="en_US"                                                                                                                                          
LC_NAME="en_US"                                                                                                                                           
LC_ADDRESS="en_US"                                                                                                                                        
LC_TELEPHONE="en_US"                                                                                                                                      
LC_MEASUREMENT="en_US"                                                                                                                                    
LC_IDENTIFICATION="en_US"                                                                                                                                 
LC_ALL=en_US                                                                                                                                              
vagrant@vagrant:~$ python                                                                                                                                 
Python 2.7.3 (default, Feb 27 2014, 19:58:35)                                                                                                             
[GCC 4.6.3] on linux2                                                                                                                                     
Type "help", "copyright", "credits" or "license" for more information.                                                                                    
>>> print u'\u1f41'                                                                                                                                       
Traceback (most recent call last):                                                                                                                        
  File "<stdin>", line 1, in <module>                                                                                                                     
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u1f41' in position 0: ordinal not in range(256)

How can I get a true utf-8 system environment for python?


Solution

  • The locale for LC_CTYPE ought to be en_US.UTF-8 in locale output. Try

    export LC_ALL="en_US.UTF-8"
    

    and if it does not work (as in LC_CTYPE set explicitly), also:

    export LC_CTYPE="en_US.UTF-8"