Search code examples
mysqlcollation

MySQL - problem with comparing strings constants containing national characters


I have the following test schema at MySQL 8.0.32:

CREATE SCHEMA `test_schema` DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_pl_0900_ai_ci ;

CREATE TABLE `test_schema`.`test_table` (
  `test_column` VARCHAR(64) NULL
);

INSERT INTO `test_schema`.`test_table` (`test_column`) VALUES('Łucja');
INSERT INTO `test_schema`.`test_table` (`test_column`) VALUES('Lucjan');

As one might expect, there is no explicit collation provided for the table and column, so they inherit the schema setting (I have checked that, skipping the SQL statements for clarity).

Now I am running the following and getting the expected result:

mysql> SELECT * FROM test_table WHERE test_column LIKE 'Ł%';
+-------------+
| test_column |
+-------------+
| Łucja       |
+-------------+

In Polish language letter Ł is something totally different from letter L and the collation handles it correctly.

I'd expect that the following query returns no results, but I get surprised here:

mysql> SELECT * FROM test_table WHERE 'Lucjan' LIKE 'Ł%';
+-------------+
| test_column |
+-------------+
| Łucja       |
| Lucjan      |
+-------------+

After few tests of the above kind I can tell, that "standalone" strings (i.e. not coming from the table) do not use the schema collation setting when being compared. For example SELECT 1 WHERE 'L' LIKE 'Ł%' returns a row.

I know that this behavior can be overridden by using BINARY keyword:

SELECT 1
WHERE 'L' LIKE BINARY 'Ł%'

returns empty result set. Anyway when I try using it in a stored function, I get the following warning:

1287 'BINARY expr' is deprecated and will be removed in a future release. Please use CAST instead

Indeed, SELECT 1 WHERE CAST('L' AS CHAR(6) CHARSET latin2) LIKE 'Ł%' also does not return any rows. Anyway I would love to have more clean and simply solution.

Is it possible to override the problem making behavior at schema or session level?
Or at least is there any less complex and not deprecated piece of code I could use here?


Solution

  • If you use a bare string constant, it is not associated with your table, so it does not use the collation of that table. It uses the collation of your session.

    mysql> SELECT 'Lucjan' LIKE 'Ł%' as same;
    +------+
    | same |
    +------+
    |    1 |
    +------+
    

    I can explicitly tell either string argument which collation to use, and this overrides the session collation.

    mysql> SELECT 'Lucjan' LIKE 'Ł%' collate utf8mb4_pl_0900_ai_ci as same;
    +------+
    | same |
    +------+
    |    0 |
    +------+
    

    It works if either string argument is explicitly given a collation.

    mysql> SELECT 'Lucjan' collate utf8mb4_pl_0900_ai_ci LIKE 'Ł%' as same;
    +------+
    | same |
    +------+
    |    0 |
    +------+
    

    Or I can change my session collation, and then subsequent string constant expressions use that collation.

    mysql> set names utf8mb4 collate utf8mb4_pl_0900_ai_ci;
    
    mysql> SELECT 'Lucjan' LIKE 'Ł%' as same;
    +------+
    | same |
    +------+
    |    0 |
    +------+