Search code examples
unicode

Where are the fields documented for the unicode.org file "UnicodeData.txt"?


I cannot find documentation for the actual fields of the UnicodeData.txt file.

The data is available here. The document describing it is available here but it doesn't list the actual field numbers and what the field is (like used to be in the document around version 3.0).

I've searched the site and must be missing something that is right in front of my eyes, but I can't find it.

Can someone point out where this information is now?


Solution

  • Update 2024-05-01

    @chx's comment on the question points to the page, About the Unicode Character Database (https://unicode.org/ucd/). The first paragraph on that page currently reads:

    The Unicode Character Database (UCD) consists of a number of data files listing Unicode character properties and related data. It also includes data files containing test data for conformance to several important Unicode algorithms. Full documentation for the UCD can be found in Unicode Standard Annex #44, Unicode Character Database.

    Annex #44 (and its link) contains a full description of UnicodeData.txt and the other files that comprise the UCD. @mcmcc and @AlexisWilke also point to Annex #44 as the current definition.

    update

    sorry, I misread the question. Still, I think the information is in the link you provided, under section UnicodeData.txt. For each field, a link inside the document lists its values if applicable. Seems to be the same list as in the 3.0 version.