Search code examples
pythonstringunicode

Official repository of Unicode character names


There are a few ways to get the list of all Unicode characters' names: for example using Python module unicodedata, as explained in List of unicode character names, or using the website: https://unicode.org/charts/charindex.html but here it's incomplete, and you have to open and parse PDF to find the names.

But what is the official source / repository of all Unicode character names? (such that if a new character is added, the list is updated, so I'm looking for the initial source for these names, in a machine readable format).

I'm looking for a list with just code point and name, in CSV or any other format:

code   character name
...
0102   LATIN CAPITAL LETTER A WITH BREVE
0103   LATIN SMALL LETTER A WITH BREVE
...

Solution

  • The official source for the actual character data (which includes the character names and many, many other details) is the Unicode Character Database.

    The latest version of the data files can be accessed via http://www.unicode.org/Public/UCD/latest/.

    Names specifically can be found in the files NamesList.txt. The format of that file is described here.

    This is the list in CSV format: https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt