Search code examples
unicodeuppercaselowercase

Unicode lowercase characters?


I read up someplace, that there are characters other than A-Z that have a lowercase equivalent, in Unicode. Which could these be, and why would any other character need an upper and lower case?


Solution

  • The English language, and even that strange variant, American English :-) , is not the only language on the planet. There are some very strange looking ones (at least to those familiar with the Latin-based characters) but even Latin-based ones have minor variations.

    Two of which I am acquainted with on more than a casual basis are Greek and German:

    Αα Ββ Γγ Δδ Εε Ζζ  Ηη Θθ Ιι Κκ Λλ Μμ
    Νν Ξξ Οο Ππ Ρρ Σσς Ττ Υυ Φφ Χχ Ψψ Ωω
    
    Aa Ää Bb Cc Dd Ee Ff Gg Hh Ii Jj Kk Ll Mm Nn
    Oo Öö Pp Qq Rr Ss ß  Tt Uu Üü Vv Ww Xx Yy Zz
    

    That's why we're not allowed to use bits of code like:

    char lower = upper - 'A' + 'a';
    

    any more. Doing something like that in a company that takes i18n seriously is near grounds for dismissal. Using Unicode-aware toLower()/toUpper()-type functions is the better way to go.