Search code examples
sortingperlunicodeicu

Using a collating sequence specified in an LDML file for doing a line sort


I have an LDML file that specifies a collating sequence for a language not listed in /usr/share/locale.

I want to use the collating sequence from the LDML file to do a line sort in Linux.

My preferred tool is the bash sort command

I could also use the Perl Unicode::ICU::Collator if I understood how to set it up with information from the LDML file.


Solution

  • A python (rather than perl) solution is available using the icu library documented at:

    https://github.com/silnrsi/collation

    and

    https://scriptsource.org/cms/scripts/page.php?item_id=entry_detail&uid=lcepuup9ga

    Sample code is at:

    https://github.com/WesPeacock/ldml-sort