Search code examples
mysqlencodingutf-8asciilatin1

MySQL String Encoding Storage Space


I'm aware that utf8 in MySQL is stored as a fixed length 3 bytes (24 bits) and utf4mb4 is a fixed 4 bytes. I presume Latin 1 to be a fixed length 1 byte - my question is is the mysql ascii type 7 bits and is there a slight storage savings over Latin-1?


Solution

  • Your initial premise is incorrect. In VARCHAR as well as *TEXT columns, utf8 and utf8mb4 characters only consume as many bytes of storage as are necessary for the utf8 representation of each character. The only fixed multibyte allocation is for CHAR columns, since they are fixed width.

    Tip: To save space with UTF-8, use VARCHAR instead of CHAR. Otherwise, MySQL must reserve three bytes for each character in a CHAR CHARACTER SET utf8 column because that is the maximum possible length. For example, MySQL must reserve 30 bytes for a CHAR(10) CHARACTER SET utf8 column.

    http://dev.mysql.com/doc/refman/5.6/en/charset-unicode-utf8.html

    Otherwise, when storing utf8 and utf8mb characters, the maximum bytes per character is not used for each character:

    when using the utf8 (or utf8mb4) Unicode character set, you must keep in mind that not all characters use the same number of bytes and can require up to three (four) bytes per character. [emphasis added]

    http://dev.mysql.com/doc/refman/5.6/en/storage-requirements.html

    There is no 7 bit storage-byte-spanning character set.

    However, compressing the entire table can be done in newer versions, and if you have sufficient CPU, the server can actually perform better with table compression enabled, if your disks are relatively slow or your server spends a lot of time being I/O-bound.

    http://dev.mysql.com/doc/refman/5.6/en/innodb-compression.html