Search code examples
indexingutf-8innodbmultibyteutf8mb4

Does InnoDB stores multibyte strings in expanded form, in indexes?


Does InnoDB stores multibyte strings in expanded form, in indexes?

For example, does each utf8mb4 string take 4 bytes?

I've tried to test this empirically from information_schema.tables.index_length, however, the value is not deterministic, so it's not a reliable method. I also couldn't find this concept in the documentation.

Edit: to clarify, the question is, in a nutshell: how many bytes are required to store a 1-byte utf8mb4 character (say, U+0050) in an InnoDB index on a CHAR(1) NOT NULL column (not taking into account the index metadata)?


Solution

  • All characters in utf8 string are stored as variable-length encodings. Each character uses either 1, 2, 3, or 4 bytes depending on its code point. A string can have a mix of encodings, because each code point identifies its length in the initial bits of each byte.

    enter image description here

    The characters that are in the ASCII subset will only use 1 byte.