Search code examples
mysqlutf-8emojiutf8mb4

Is it possible to enable emojis for specific columns of specific tables?


First, I would like to assure you that I have done my "homework" and have read this, this, this and this. Also, one of my former questions is closely related to this one, but in that question I am dealing about flourishlib's compatibility issues with utf8mb4. This question deals with a deeper level. Let's suppose that I have several tables and I want to modify just a few columns to have utf8mb4 encoding, to preserve some storage space and performance after the change. If I changed the whole database to have an encoding of utf8mb4, then its size would increase with 33%, which will affect its performance badly as well. So, we have selected four columns from three different tables to support emojis. These are:

  • users.bio (tinytext, utf8_general_ci)
  • questions.question (longtext, utf8_general_ci)
  • questions.answer (longtext, ut8_general_ci)
  • comments.comment (tinytext, utf8_general_ci)

As a result, my action plan is as follows:

  1. Create a backup of the database

  2. Run these commands:

alter table comments change comment comment tinytext character set utf8mb4 collate utf8mb4_unicode_ci;
alter table users change bio bio tinytext character set utf8mb4 collate utf8mb4_unicode_ci;
alter table questions change question question longtext character set utf8mb4 collate utf8mb4_unicode_ci;
alter table questions change answer answer longtext character set utf8mb4 collate utf8mb4_unicode_ci;

Expectations:

  • this should make the specified columns use utf8mb4 instead of utf8
  • existent data will be correctly converted to utf8mb4, that is, previous texts will be preserved and the users will be able to correctly read their content
  • other columns will not be changed
  • queries involving the affected tables will be slower

Are my expectations accurate? Do I need to change the connection? Thanks


Solution

  • You need utf8mb4 in any columns that are storing Chinese.

    In VARCHAR(...) utf8mb4, each "character" takes 1-4 bytes. No 33% increase. On the other hand, CHAR(10) utf8mb4 is always allocated 40 bytes.

    You do need to establish that your client is talking utf8mb4, not just utf8. That comes in some parameter in the connection or with SET NAMES utf8mb4.

    If you need to automate the ALTERs, it is pretty easy to generate them via a SELECT into information_schema.

    Addenda

    Expectations 1-3: Yes.

    Expectation 4 (queries involving the affected tables will be slower) -- processing will be essentially the same speed.