Search code examples
stringicucapitalizationtitle-case

Proper title case in ICU [Does ICU have a list of non-capitalized words?]


Is it possible to obtain proper capitalization for e.g. English text using ICU4C but without building any custom set of non-capitalized words? Say, given pining for the fjords I'd like to obtain Pining for the Fjords.

With ucasemap_utf8ToTitle() and UnicodeString::toTitle I get Pining For The Fjords, no matter which BreakIterator or locale I use.


Solution

  • @Jongware should get the credit for explaining this so well. Your question might be - does ICU have a list of non-capitalized words?

    But the short answer for ICU is: No.

    CLDR (from whence ICU gets its data) used to have "Stop words" for search purposes, but they were not well maintained and removed: http://unicode.org/cldr/trac/ticket/5204