Search code examples
unicodelocalestandards

Where is en_US.UTF-8 defined?


Where is the actual definition of the collating and comparison mappings for en_US.UTF-8? I assume there's some standards document, reference source code, and/or data table available somewhere?


Solution

  • It's Unicode.

    /usr/lib/locale/en_US.utf8/LC_COLLATE is created by localedef. man localedef shows the input path /usr/share/i18n/locales.

    /usr/share/i18n/locales/en_US § LC_COLLATE references file iso14651_t1, which references iso14651_t1_common, which is a file published by ISO, which tells us the originating source unidata-9.0.0.txt. Run git clone git://sourceware.org/git/glibc.git to see the history of these files.

    http://enwp.org/ISO_14651 says the ISO standard and UCA are aligned, so the corresponding file at unicode.org is allkeys.txt.