I'm trying to insert a surrogate pair ('𤭢', \uD852\uDF62
, the same as U+24B62
from this example) into MySQL.
An INSERT
with an unescaped literal, suggested by this answer:
INSERT INTO unicode_test (value) VALUES ('𤭢');
-- or
INSERT INTO unicode_test (value) VALUES (_utf8'𤭢');
fails with
Error Code: 1366. Incorrect string value: '\xF0\xA4\xAD\xA2' for column 'value' at row 1
(note that \xF0\xA4\xAD\xA2
isn't even close to the original value of \uD852\uDF62
).
On the other hand, both
INSERT INTO unicode_test (value) VALUES (_utf16'𤭢');
and
INSERT INTO unicode_test (value) VALUES (_utf8mb4'𤭢');
succeed, but the inserted values are different from the original one.
My database uses the utf8mb4
character set, so I assume it should handle surrogates transparently.
What is the recommended way of inserting non-BMP characters into MySQL?
Use CHARACTER SET utf8mb4
, not utf8
or utf16
.
See "Best practice" in http://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored
It sounds like you don't have the connection parameter (or SET NAMES
).
What you tried should have worked:
INSERT INTO unicode_test (value) VALUES ('𤭢');
Another approach is
INSERT INTO unicode_test (value) VALUES (UNHEX('F0A4ADA2'));