I have a table like this
CREATE TABLE `mb1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(30) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `u_name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
And I insert tow rows
insert into mb1(name) values('K'),('K');
Note, the second K is unicode character
+------+-----------+
| name | hex(name) |
+------+-----------+
| K | 4B |
| K | EFBCAB |
+------+-----------+
Why do they cause unique key collision? Aren't they different character in utf8mb4?
After remove COLLATE utf8mb4_unicode_ci, the problem disappeared.
Why do they cause unique key collision? Aren't they different character in utf8mb4?
You are missing the point about CHARACTER SET
and COLLATION
. A CHARACTER SET
is a collection of different characters. A COLLATION
says whether to treat the characters as equal -- think A
and a
-- different characters, but treated for ORDER BY
and WHERE =
, etc as being the same.
mysql> SELECT 'K'='K' COLLATE utf8_unicode_ci;
+-----------------------------------+
| 'K'='K' COLLATE utf8_unicode_ci |
+-----------------------------------+
| 1 |
+-----------------------------------+
So in utf8_unicode_ci (or utf8mb4_unicode_ci), those two characters are considered to be "equal".
"Equal" is the test for UNIQUE
keys.
Set the COLLATION
for the column to whatever makes sense for you.
utf8mb4_latvian_ci is a little different: K=k but not equal to Ķ=ķ . There are other specialized collations for other languages (mostly Western European).
Your K
is called "FULLWIDTH LATIN CAPITAL LETTER K", so it is quite reasonable that it compare equal to Latin K
.