Is it possible to obtain proper capitalization for e.g. English text using ICU4C but without building any custom set of non-capitalized words? Say, given pining for the fjords
I'd like to obtain Pining for the Fjords
.
With ucasemap_utf8ToTitle()
and UnicodeString::toTitle
I get Pining For The Fjords
, no matter which BreakIterator
or locale I use.
@Jongware should get the credit for explaining this so well. Your question might be - does ICU have a list of non-capitalized words?
But the short answer for ICU is: No.
CLDR (from whence ICU gets its data) used to have "Stop words" for search purposes, but they were not well maintained and removed: http://unicode.org/cldr/trac/ticket/5204