Search code examples
mysqlsqlunicodesurrogate-pairs

Inserting a surrogate pair into MySQL with an INSERT statement


I'm trying to insert a surrogate pair ('𤭢', \uD852\uDF62, the same as U+24B62 from this example) into MySQL.

An INSERT with an unescaped literal, suggested by this answer:

INSERT INTO unicode_test (value) VALUES ('𤭢');
-- or
INSERT INTO unicode_test (value) VALUES (_utf8'𤭢');

fails with

Error Code: 1366. Incorrect string value: '\xF0\xA4\xAD\xA2' for column 'value' at row 1

(note that \xF0\xA4\xAD\xA2 isn't even close to the original value of \uD852\uDF62).

On the other hand, both

INSERT INTO unicode_test (value) VALUES (_utf16'𤭢');

and

INSERT INTO unicode_test (value) VALUES (_utf8mb4'𤭢');

succeed, but the inserted values are different from the original one.

My database uses the utf8mb4 character set, so I assume it should handle surrogates transparently.

What is the recommended way of inserting non-BMP characters into MySQL?


Solution

  • Use CHARACTER SET utf8mb4, not utf8 or utf16.

    See "Best practice" in http://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored It sounds like you don't have the connection parameter (or SET NAMES).

    What you tried should have worked:

    INSERT INTO unicode_test (value) VALUES ('𤭢');
    

    Another approach is

    INSERT INTO unicode_test (value) VALUES (UNHEX('F0A4ADA2'));