Search code examples
pythonunicodestandardspython-unicode

Update the version of the Unicode Standard that is being used by Python


How can I check which version of the Unicode Standard is being used by Python? Does Python automatically use the latest version of the Unicode Standard? Do I need to update Python or a certain package to use the latest version of the Unicode Standard?

For example, the latest version of the Unicode Standard, version 13.0, was released in March 2020, and is available in electronic format from the consortium's website. If I use Python 3.6.1 which was released on March 21 2017, can I benefit from all the updates that Unicode 13.0 brings?

I know that Unicode is an international standard that all computers are supposed to abide by, but I am not sure how Python handles it. Thanks in advance!

P.S. I talked about Python 3, I do not consider Python 2. The post is posted on October 21 2020.


Solution

  • The version of the Unicode character database is specified in the unicodedata module in the standard library.

    >>> # Python 3.9
    >>> import unicodedata
    >>> unicodedata.unidata_version
    '13.0.0'
    
    

    The Unicode data is compiled into each Python release so there is no simple way to update the version.

    This pull request shows what is done to upgrade the version from 12.1 to 13.0.

    It's worth noting that using an older version of Python doesn't prevent you from processing text containing more recent graphemes, it only prevents you from using the functions in the unicodedata module on them.