Search code examples
mysqlutf-8utf8mb4

SELECT Hüsby returns the 'wrong' output Husby


In the same database, I run the query and get one correct and one incorrect result.

correct

SELECT Ort FROM `stammdaten` WHERE `Ort` = 'Husby';
    
Ort 
Husby   
Husby   

wrong

SELECT Ort FROM stammdaten WHERE Ort = 'Hüsby';

Ort
Husby
Husby

database : utf8mb4_unicode_ci

table : utf8mb4_unicode_ci

field : utf8mb4_unicode_ci

Does anyone have any ideas what else I can change or check?


Solution

  • You can check the documentation, because everything works as expected:

    • 10.3.1 Collation Naming Conventions: "Collation suffixes indicate whether a collation is case-sensitive, accent-sensitive, or kana-sensitive (or some combination thereof), or binary." The table shows that the suffix _ci at least stands for "case insensitivity".
    • "For nonbinary collation names that do not specify accent sensitivity, it is determined by case sensitivity. If a collation name does not contain _ai or _as, _ci in the name implies _ai and _cs in the name implies _as." So the collation utf8mb4_unicode_ci is also accent insensitive.
    • If you want accent sensitivity but at the same time case insensitivity then choose utf8mb4_0900_as_ci as per 10.2 Character Sets and Collations in MySQL.
    • Blindly casting columns and/or literals as a BINARY type is not the same as applying the utf8mb4_bin collation, as it in general comes with more restrictions. See 10.8.5 The binary Collation Compared to _bin Collations.

    Unicode needs to be understood (ü versus ), case insensitivity rules need to be understood (ß versus SS), and accent insensitivity need to be understood (Café versus cafè). Otherwise you end up having data stored which you are unable to properly find or filter by because you chose the wrong collation. Understanding the sorting is also an aspect (does ü sort after u or after ö?), although rarily of interest.