I have a PHP 7.3 project that's currently using MySQL 5.5, with utf8
tables. Some of the tables contain emoji data, which show up fine in the current project. I'm trying to update the project to MySQL 8.x, but when I do, emoji data shows up incorrectly.
First, I updated all the 5.5 tables to use uf8mb4
. In this state, the data showed up. I then updated to 5.7, and things continued to work. I dumped this data, updated to 8.0, and reloaded it (I did use the --default-character-set=utf8mb4
flag on both dump and load), and then the data stopped showing up correctly, for example a lightbulb showing up as 💡
.
I am running each of these services in docker. I was able to update from 5.5 to 5.7 using the same data volume without issue, but when trying to upgrade from 5.7 to 8.0, I got errors I was unable to resolve, and ended up doing a data dump/restore.
An example table with a field with an emoji:
DROP TABLE IF EXISTS `forums`;
/*!40101 SET @saved_cs_client = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `forums` (
`forumID` int(11) NOT NULL AUTO_INCREMENT,
`title` varchar(200) COLLATE utf8_unicode_ci NOT NULL,
`description` text COLLATE utf8_unicode_ci,
`forumType` varchar(1) COLLATE utf8_unicode_ci DEFAULT 'f',
`parentID` int(11) DEFAULT NULL,
`heritage` varchar(25) COLLATE utf8_unicode_ci NOT NULL,
`order` int(5) NOT NULL,
`gameID` int(11) DEFAULT NULL,
`threadCount` int(11) NOT NULL,
PRIMARY KEY (`forumID`),
UNIQUE KEY `heritage` (`heritage`),
KEY `parentID` (`parentID`)
) ENGINE=MyISAM AUTO_INCREMENT=11551 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
/*!40101 SET character_set_client = @saved_cs_client */;
--
-- Dumping data for table `forums`
--
LOCK TABLES `forums` WRITE;
/*!40000 ALTER TABLE `forums` DISABLE KEYS */;
INSERT INTO `forums` VALUES (8003,'💡 Gamers\' Plane development',NULL,'f',2,'0002-8003',3180,3181,4);
/*!40000 ALTER TABLE `forums` ENABLE KEYS */;
UNLOCK TABLES;
To update the table to utf8mb4 I did
ALTER TABLE forums CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
How I'm testing the return of that data:
<?php
$mysql = new PDO("mysql:host=mysql;dbname=gamersplane", 'gamersplane', 'mypass');
$mysql->setAttribute(PDO::ATTR_DEFAULT_FETCH_MODE, PDO::FETCH_ASSOC);
$mysql->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$forum = $mysql->query('select * from forums where forumID = 8003')->fetch();
?>
<html>
<header>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</header>
<body>
<?php print_r($forum['title']); ?>
</body>
</html>
I've read [UTF-8 all the way through][1] and
[1]: UTF-8 all the way through and
utf8mb4
in the databasecharset=utf8mb4
in my PDO stringdefault_charset
explicitly set in my php.ini as well as tried setting it at runtimeContent-Type: text/html; charset=utf-8
as a PHP header, as well as a HTML metatagEncoding in DataBases is always a lot of fun! Unfortunately when you change the character set it doesn't update the data, only how the database interprets the data, as well MySQL doesn't perform encoding change on the fly, and always writes down bytes as they are coming from the client. From the example you can see, that 💡
is the latin1
representation of the 💡
, and when you dump the data it dumps it already in incorrect encoding.
To verify the issue you can try to convert the data with the query:
SELECT
CONVERT(BINARY(CONVERT(title USING latin1)) USING utf8mb4)
FROM forums
WHERE id = 8003;
in your latest MySQL8 environment, it should display emojis correctly. If so, you should try to dump data again, and this time use the charset it was encoded originally, most likely latin1
using --default-character-set=latin1
. The dump file should contain emojis instead of 💡
-like text.
Be aware, that if you have new content in the table, it will be double encoded, or the dump fill will fail, if new text is not compatible with the latin1
encoding, it would be better to do it with the original set, if you still have access to it.