First, I would like to assure you that I have done my "homework" and have read this, this, this and this. Also, one of my former questions is closely related to this one, but in that question I am dealing about flourishlib's compatibility issues with utf8mb4. This question deals with a deeper level. Let's suppose that I have several tables and I want to modify just a few columns to have utf8mb4 encoding, to preserve some storage space and performance after the change. If I changed the whole database to have an encoding of utf8mb4, then its size would increase with 33%, which will affect its performance badly as well. So, we have selected four columns from three different tables to support emojis. These are:
users.bio
(tinytext, utf8_general_ci)questions.question
(longtext, utf8_general_ci)questions.answer
(longtext, ut8_general_ci)comments.comment
(tinytext, utf8_general_ci)As a result, my action plan is as follows:
Create a backup of the database
Run these commands:
alter table comments change comment comment tinytext character set utf8mb4 collate utf8mb4_unicode_ci;
alter table users change bio bio tinytext character set utf8mb4 collate utf8mb4_unicode_ci;
alter table questions change question question longtext character set utf8mb4 collate utf8mb4_unicode_ci;
alter table questions change answer answer longtext character set utf8mb4 collate utf8mb4_unicode_ci;
Expectations:
Are my expectations accurate? Do I need to change the connection? Thanks
You need utf8mb4 in any columns that are storing Chinese.
In VARCHAR(...)
utf8mb4, each "character" takes 1-4 bytes. No 33% increase.
On the other hand, CHAR(10) utf8mb4
is always allocated 40 bytes.
You do need to establish that your client is talking utf8mb4, not just utf8. That comes in some parameter in the connection or with SET NAMES utf8mb4
.
If you need to automate the ALTERs
, it is pretty easy to generate them via a SELECT into information_schema.
Addenda
Expectations 1-3: Yes.
Expectation 4 (queries involving the affected tables will be slower) -- processing will be essentially the same speed.