Search code examples
mysqlmariadbcollationutf8mb4

Adding collation to utf8mb4 charset - MySQL/MariaDB


If you want to add a custom collation in mysql/mariaDB, for utf-8 charsets you can modify .../charsets/Index.xml and extend the charset with the LDML-Syntax:

<charset name="utf8">
  ...
  <collation name="utf8_myown_ci" id="1234">
    <rules>
      <reset>\u0000</reset>
        <i>\u0020</i> <!-- space -->
        ...
    </rules>
  </collation>
  ...
</charset>

But there is not charset-tag with name "utf8mb4". So I created one with name="utf8mb4" and added collation/rules tags and in phpmyadmin i could choose the newly created collation. But i couldn't inserts four byte characters; i get the error

"#1366 - Incorrect string value: '\xF0\x9F\x8D\xB5\xF0\x9F...' for field ..."

(with the build in mb4-collation i can do it).

To be more precise: I have one column (a) with the bulit-in collation utf8mb4_general_ci and one column (b) with my own collation utf8mb4_myown_ci(defined in Index.xml). I insert the same data in both columns and in column a there is no error and in column b i'll get the error as described above.

I created the following entry in Index.xml:

<charset name="utf8mb4">
  <family>Unicode</family>
  <description>UTF-8 MB4 Unicode</description>
  <collation name="utf8mb4_general_ci" id="45">
    <flag>primary</flag>
    <flag>compiled</flag>
  </collation>
  <collation name="utf8mb4_bin"     id="46">
    <flag>binary</flag>
    <flag>compiled</flag>
  </collation>
  <collation name="utf8mb4_myown_ci"  id="213">
  </collation>
</charset>

It seems to be no problem to have the collation-tag empty, because i created an empty utf8_myown_ci inside charset="utf-8" and this works.

In the column with utf8mb4_myown_ci i can also insert 3 Byte Chars, so it seems it is interpreted as an utf8 collation.

I tried google multiple times and didn't find anything here, but i couldn't find any hints, how to add collations to charsets, which aren't present in Index.xml.

Any Ideas how to do it? Thank you for any hints!


Solution

  • Turns out, i used an occupied collation-ID. If i use e.g. 501 instead of 213, it works.