Search code examples
phpmysqlutf-8doctrine-ormutf8mb4

PHP 7 & Doctrine 2.4.3 : Changing column charset from utf8 to utf8mb4


I'm working on a project using Doctrine 2.4.3 with a MySQL 5.7.21 database with utf8 as default charset.

Recently, I've been looking to implement emoji support. To overcome MySQL's limitation of 3 bytes for utf8, I need to change the columns that can receive emojis to the utf8mb4 charset (see https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html).

However, I have not found a way to reflect this in my entities (using annotations).

My database connection config is the following :

$data = array(
    'driver' => 'pdo_mysql',
    'host' => $dbhost,
    'port' => $dbport,
    'dbname' => $dbname,
    'user' => $dbuser,
    'password' => $dbpw,
    'charset' => 'utf8mb4'
);

I tried adding annotations to the table : /* @Entity(repositoryClass="path\to\DAO") @Table(name="post", indexes={@Index(name="uid", columns={"uid"})}, options={"charset":"utf8mb4", "collation":"utf8mb4_unicode_ci"}) * @HasLifecycleCallbacks */ class Post extends BaseEntity { ... }

In the same fashion, tried adding annotations to the column (in the same table) itself : /* @Column(type="text", options={"charset":"utf8mb4", collation":"utf8mb4_unicode_ci"}) */ protected $text;

None of the above worked. I expected an ALTER TABLE query when executing doctrine orm:schema-tool:update --dump-sql but Doctrine sees no change, and I still can't insert 4 bytes emojis.

If I update the column's charset myself directly in MySQL, emojis do get supported, but when I do run orm:schema-tool:update, Doctrine sees a difference between my entity and the schema, but seems to not know what to make of it since the output I get is the following : ALTER TABLE post CHANGE text text LONGTEXT NOT NULL ;

I also tried to add SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci as driverOptions in my database connection config array, alas to no result either.

Unfortunately, I could not find anything regarding this matter in Doctrine's documentation.

If any of you has any clue regarding this matter, feel free to hit me up! Thanks in advance.


Solution

  • As I have legacy requirements and cannot update Doctrine's lib as of right now, I had to find a workaround.

    What I did was manually convert my tables to utf8mb4 with SQL queries, which is not overwritten by Doctrine back to utf8 when executing orm:schema-tool:update --force after the charset conversion.

    For the record, I generated the update statements with the following script :

    SELECT CONCAT('ALTER TABLE ', t.table_schema, '.', t.table_name, ' CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;')
    FROM information_schema.tables t
    WHERE t.table_schema LIKE {your_schema};
    

    ^ Do not execute this blindly - check beforehand if existing data will fit while utf8mb4 encoded. For more details check the very good article from Mathias Bynens on the matter : https://mathiasbynens.be/notes/mysql-utf8mb4#column-index-length

    I also changed the database's charset settings.

    ALTER DATABASE {database_name} CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
    

    I did keep the 'charset' => 'utf8mb4' in the Doctrine's database connection settings array for correct transmission of the data.

    For new entities (tables), annotating them with correct settings in table options does create them with the right charset and collation :

    @Entity @Table(name="table", options={"charset":"utf8mb4", "collate":"utf8mb4_unicode_ci"})
    

    Cheers.