mysql utf-8 character-encoding ascii query-performance

Does joining ASCII and UTF-8 tables add overhead?

Many tables will do fine using CHARACTER SET ascii COLLATE ascii_bin which will be slightly faster. Here's an example:

CREATE TABLE `session` (
    `id` CHAR(64) NOT NULL,
    `created_at` INTEGER NOT NULL,
    `modified_at` INTEGER NOT NULL,
    PRIMARY KEY (`id`),
    CONSTRAINT FOREIGN KEY (`user_id`) REFERENCES `user`(`id`)
) CHARACTER SET ascii COLLATE ascii_bin;

But if I were to join it with:

CREATE TABLE `session_value` (
    `session_id` CHAR(64) NOT NULL,
    `key` VARCHAR(64) NOT NULL,
    `value` TEXT,
    PRIMARY KEY (`session_id`, `key`),
    CONSTRAINT FOREIGN KEY (`session_id`) REFERENCES `session`(`id`) ON DELETE CASCADE
) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;

what's gonna happen? Logic tells me it should be seamless, because ASCII is a subset of UTF-8. Human nature tells me I can expect anything from a core dump to a message Follow the white rabbit. appearing on my screen. ¯\_(ツ)_/¯

Solution

Does joining ASCII and UTF-8 tables add overhead?

Yes.

If you do

SELECT whatever 
  FROM session s
  JOIN session_value v 
         ON s.id = v.session_id

the query engine must compare many values of id and session_id to satisfy your query.

If id and session_id have exactly the same datatype, the query planner will be able to exploit indexes and fast comparisons.

But if they have different character sets, the query planner must interpret your query as follows.

 ...  JOIN session_value v 
         ON CONVERT(s.id USING utf8mb4) = v.session_id

When a WHERE or ON condition has the form f(column) it makes the query non-sargable: it prevents efficient index use. That can hammer query performance.

In your case, similar performance problems will occur when you insert rows to session_value: the server must do the conversion to check your foreign key constraint.

If these tables are going to production, you'd be very wise to use the same character set for these columns. It's much easier to fix this when you have thousands of rows than when you have millions. Seriously.

What makes a SQL statement sargable?