I am using flourishlib for a website. My client requested that we should be able to use emojis with mobile phones. In theory we should change the character-encoding from utf8 to utf8mb4 for the MySQL database.
So far, so good, however, if we make this switch, like this:
# For each database:
ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE utf8mb4_unicode_ci;
# For each table:
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
# For each column:
ALTER TABLE table_name CHANGE column_name column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
# (Don’t blindly copy-paste this! The exact statement depends on the column type, maximum length, and other properties. The above line is just an example for a `VARCHAR` column.)
Then each character will use four bytes instead of three bytes. This would increase the database's size with 33%. This would result in worse performance and more storage space used up. So, as a result, we have decided to switch to an encoding of utf8mb4 for only specific columns of specific tables.
To make sure everything is all right, I have checked several things. Among them, I have checked flourishlib and found a few suspect parts:
There is an fUTF8 class, which does not seem to support utf8mb4
At fDatabase I am quoting some findings:
if ($this->connection && function_exists('mysql_set_charset') && !mysql_set_charset('utf8', $this->connection)) {
throw new fConnectivityException(
'There was an error setting the database connection to use UTF-8'
);
}
//...
// Make MySQL act more strict and use UTF-8
if ($this->type == 'mysql') {
$this->execute("SET SQL_MODE = 'REAL_AS_FLOAT,PIPES_AS_CONCAT,ANSI_QUOTES,IGNORE_SPACE'");
$this->execute("SET NAMES 'utf8'");
$this->execute("SET CHARACTER SET utf8");
}
At fSQLSchemaTranslation I can see this:
$sql = preg_replace('#\)\s*;?\s*$#D', ')ENGINE=InnoDB, CHARACTER SET utf8', $sql);
I have the suspicion that flourishlib will not support our quest of making a few columns of a few table have a character encoding of utf8mb4. I wonder whether we can upgrade something somehow to make this support. As a worst-case scenario, we can override every textual occurrence of utf8 to utf8mb4. However, that would be a very ugly hack and we wonder whether there is a better solution. Should we make this hack or is there a more orthodox approach?
I have resolved the issue. I have altered the tables where I wanted to support emojis by changing the column character set and collation, like this:
ALTER TABLE table_name CHANGE column_name column_name text CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
After that, I had to make a few ugly hacks to make flourishlib able to support emojis.
fDatabase.php:
line 685:
if ($this->connection && function_exists('mysql_set_charset') && !mysql_set_charset('utf8mb4', $this->connection)) {
throw new fConnectivityException(
'There was an error setting the database connection to use UTF-8'
);
}
line 717 stays the same, everything crashes if this line is changed:
if ($this->connection && function_exists('mysqli_set_charset') && !mysqli_set_charset($this->connection, 'utf8')) {
line 800:
// Make MySQL act more strict and use UTF-8
if ($this->type == 'mysql') {
$this->execute("SET SQL_MODE = 'REAL_AS_FLOAT,PIPES_AS_CONCAT,ANSI_QUOTES,IGNORE_SPACE'");
$this->execute("SET NAMES 'utf8mb4'");
$this->execute("SET CHARACTER SET utf8mb4");
}
fSQLSchemaTranslation.php:
line 1554:
$sql = preg_replace('#\)\s*;?\s*$#D', ')ENGINE=InnoDB, CHARACTER SET utf8mb4', $sql);
fXML.php:
line 403:
if (preg_replace('#[^a-z0-9]#', '', strtolower($encoding)) == 'utf8mb4') {
// Remove the UTF-8 BOM if present
$xml = preg_replace("#^\xEF\xBB\xBF#", '', $xml);
fCore::startErrorCapture(E_NOTICE);
$cleaned = self::iconv('UTF-8', 'UTF-8', $xml);
if ($cleaned != $xml) {
$xml = self::iconv('Windows-1252', 'UTF-8', $xml);
}
fCore::stopErrorCapture();
}
and finally, when there are modifications for any of the columns affected, I execute this:
App::db()->query("set names 'utf8mb4'");
which, essentially triggers the ->query()
execution of an fDatabase
object.