Search code examples
mysqlperlutf8mb4

Perl MySQL utf8mb4 issue / possible bug


I am using Perl 5.20.2 and MySQL 5.5.57 on a Debian 8 machine. I recently discovered that MySQL's utf8 tables are limited to three-byte-characeters. As a consequence I can not store emojis. So, I tried utfmb4 tables which are supposed to address the issue. I changed the table from utf8 to utf8mb4 from inside the mysql client:

ALTER DATABASE `mydb` CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
ALTER TABLE `mydb`.`mytable` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE `mydb`.`mytable` CHANGE `object` `object` VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Storing data in mytable seems to work, at least I can see the expected emoji in phpMyAdmin. However, when reading from the table I receive a 4-character result with 3 unprintable charaters. The following program is supposed to print the same emoji twice:

#!/usr/bin/perl

use 5.10.1;
use warnings;
use strict;
use DBI;

binmode(STDOUT, ':utf8');

my $object = "\x{1F600}";
my $hd_db  = DBI->connect('DBI:mysql:mydb:localhost', 'user', 'password');
$hd_db->do('SET NAMES utf8mb4');

# cleanup
my $delete = $hd_db->prepare("DELETE FROM mytable");
$delete->execute;

my $insert = $hd_db->prepare("INSERT INTO mytable (object) VALUES ('" . $object . "')");
$insert->execute;
my $select = $hd_db->prepare("SELECT * FROM mytable");
$select->execute;
my $row    = $select->fetchrow_hashref;

say $object;
say $row->{'object'};

Expected output:

😀
😀

Actual output:

😀
�

Seems like a bug to me. Any suggestion how to work around it?

EDIT: SELECTing the data from within the mysql client also shows the expected emoji

mysql> SET SESSION CHARACTER_SET_CLIENT = utf8mb4;
mysql> SET SESSION CHARACTER_SET_RESULTS = utf8mb4;
mysql> SELECT * FROM mytable;
+--------+
| object |
+--------+
| 😀      |
+--------+

Solution

  • You told MySQL to use UTF-8 for communication, but you also need to tell DBD::mysql to decode the data (or do it yourself).

    You want

    my $dbh = DBI->connect('DBI:mysql:mydb:localhost', 'user', 'password', {
       mysql_enable_utf8mb4 => 1,
    })
       or die($DBI::errstr);
    

    which is equivalent to

    my $dbh  = DBI->connect('DBI:mysql:mydb:localhost', 'user', 'password')
       or die($DBI::errstr);
    
    $dbh->do('SET NAMES utf8mb4')
       or die($dbh->errstr);
    
    $dbh->{mysql_enable_utf8mb4} = 1;