Search code examples
stringpython-3.xpython-2.xnon-ascii-characters

Python 3 like string conversion in python 2


I am porting my code to python 3 with maintaining backwards compatibility.

The str function in python 2 and python 3 convert strings with non-ascii characters differently. For example:

Python 2:

In [4]: str('Alnus viridis (Chaix) DC. ssp. sinuata (Regel) A. Löve & D. Löve')
Out[4]: 'Alnus viridis (Chaix) DC. ssp. sinuata (Regel) A. L\xc3\xb6ve & D. L\xc3\xb6ve'

But in Python 3:

In [1]: str('Alnus viridis (Chaix) DC. ssp. sinuata (Regel) A. Löve & D. Löve')
Out[1]: 'Alnus viridis (Chaix) DC. ssp. sinuata (Regel) A. Löve & D. Löve'

Python 3 How can I get the same representation in Python 2? I am writing the strings to a sqlite3 table.


Solution

  • It appears what you want is a unicode string literal. In Python 3, all normal string literals are unicode string literals. In Python 2, only unicode values are unicode strings. Creating a unicode string literal in Python 2 is accomplished by putting a u in front of the literal:

    u'Alnus viridis (Chaix) DC. ssp. sinuata (Regel) A. Löve & D. Löve'
    

    This is the same representation as your Python 3 string. Note that if your source file is in UTF-8 encoding, you need to add a special comment to indicate this, on the first or second line, such as:

    # -*- coding: utf-8 -*-
    

    For more information on this, see PEP 263 or this other question.