Search code examples
sqlpostgresqlencoding

UPPER() doesn't work with cyrillic symbols in PostgreSQL 8.2 database


I have PostgreSQL 8.2 running on Ubuntu server, database encoding is UTF8. Everything is fine, except one thing: I cannot use upper() function in sql queries for cyrillic symbols. Well, I can use it, but the result of the function will still be in lowercase.

How could this be fixed?


Solution

  • This could be fixed by upgrading your version of PostgreSQL, preferably to the current version 9.1. PostgreSQL 8.2 has been released in 2006 and reached end of life recently.

    There have been a large number of improvements for locales since then. As far as I can see, it works for me in version 9.0 (on Debian Squeeze):

    SELECT upper('ѓғг̡г̨һһ̨дђеёӗ');
    

    Result:

    ЃҒГ̡Г̨ҺҺ̨ДЂЕЁӖ
    

    Note: I use UTF8 as server_encoding, UNICODE as client_encoding and *.UTF-8 (de_AT.UTF-8 to be precise) for all locale settings (lc_*). Other encodings or locales may have specific problems. You might want to show us the output of this query:

    SELECT * FROM pg_settings WHERE name ~~ '%encod%' OR name ~~ 'lc%'
    

    Edit after feedback in comments:

    If "nothing" is set, that means the pseudo-locale C (aka Posix) is set. C is agnostic to any locale settings, which allows for faster sort operations. It also means that your system has no idea what the upper case equivalent of a Cyrillic letter is supposed to be.

    Be sure to read the chapter "Locale Support" in the manual.

    However, LC_CTYPE can only be set at db cluster creation time. In newer versions you can base a new database off template0 and specify different locale and encoding settings, but not in version 8.2.

    So, I come back to my initial advise: best course of action would be to install a more recent version of PostgreSQL. And remember to set up your desired locale when setting up the new db cluster.

    PostgreSQL 9.1 is available for Debian Squeeze. So, there should be something for Ubuntu, too.