Where is the actual definition of the collating and comparison mappings for en_US.UTF-8? I assume there's some standards document, reference source code, and/or data table available somewhere?
It's Unicode.
/usr/lib/locale/en_US.utf8/LC_COLLATE
is created by localedef
. man localedef
shows the input path /usr/share/i18n/locales
.
/usr/share/i18n/locales/en_US
§ LC_COLLATE
references file iso14651_t1
, which references iso14651_t1_common
, which is a file published by ISO, which tells us the originating source unidata-9.0.0.txt
. Run git clone git://sourceware.org/git/glibc.git
to see the history of these files.
http://enwp.org/ISO_14651 says the ISO standard and UCA are aligned, so the corresponding file at unicode.org is allkeys.txt.