Search code examples
postgresqllocale

PostgreSQL improperly sorts unicode chars with Czech collation


I have a table with collation set to cs_CZ (Czech):

    Name   | Encoding |  Collation  |    CType
-----------+----------+-------------+-------------
 foo       | UTF8     | cs_CZ.UTF-8 | cs_CZ.UTF-8

but when I order by string, result is not sorted as it should be according to Czech alphabet:

=> SELECT surname FROM foo ORDER BY surname;
     surname     
-----------------
A
Da
Ďb
Dc
E

So it is sorted like if the unicode accent character (Ď) was converted to its ASCII version without accent (D). But the Czech alphabet is: ... C -> D -> Ď -> E ..., so the returned order is incorrect (in this example it should be: A -> Da -> Dc -> Ďb -> E).

Is this usual PostgreSQL behavior? Is there a way how get it sorted correctly according to Czech alphabet?

EDIT: Tried on Postgres 9.1.4, both have the same behavior. It is an Arch Linux machine.
EDIT2: Adjusted example, the Ď is the real problem.


Solution

  • It is correct. Accent for á, ď, é, ě, í, ň, ó, ť, ú, ů, ý should be ignored see article

    Czech sort rules are little bit complex :)