Does InnoDB stores multibyte strings in expanded form, in indexes?
For example, does each utf8mb4
string take 4 bytes?
I've tried to test this empirically from information_schema.tables.index_length
, however, the value is not deterministic, so it's not a reliable method. I also couldn't find this concept in the documentation.
Edit: to clarify, the question is, in a nutshell: how many bytes are required to store a 1-byte utf8mb4 character (say, U+0050
) in an InnoDB index on a CHAR(1) NOT NULL
column (not taking into account the index metadata)?
All characters in utf8 string are stored as variable-length encodings. Each character uses either 1, 2, 3, or 4 bytes depending on its code point. A string can have a mix of encodings, because each code point identifies its length in the initial bits of each byte.
The characters that are in the ASCII subset will only use 1 byte.