Search code examples
postgresqllocale

Comparing Text on PostgreSQL 8.4 and 9.1


I have two databases, one is running on postgresql 8.4 and the other on postgresql 9.1. Both are on CentOS machines with the same locale (en_US).

Suppose i have a table with this data:

id | description
1    Morango
2    CAFÉ
3    pera
4    Uva

The odd thing is, when i run a query like this one:

SELECT * FROM products WHERE description ~* 'café'

On the 8.4 machine i get no results, but on the 9.1 machine i got the row (CAFÉ). Apparently they differ on how to compare the upcase unicode character.

  1. Could someone give me some insight about this problem?
  2. Is it the different version o postgresql that can cause this problem?
  3. Are there any additional configuration i could make to equalize the behavior from the two machines?

UPDATE: Both databases are UTF-8


Solution

  • Case-insensitive regex matching for non-US Unicode characters was basically not supported before 9.0.

    See this snippet in the 9.0 release notes:

    E.14.3.6. Functions
    [...]
    Support locale-specific regular expression processing with UTF-8 server encoding (Tom Lane)

    Locale-specific regular expression functionality includes case-insensitive matching and locale-specific character classes. Previously, these features worked correctly for non-ASCII characters only if the database used a single-byte server encoding (such as LATIN1). They will still misbehave in multi-byte encodings other than UTF-8.