I'm aware that utf8
in MySQL is stored as a fixed length 3 bytes (24 bits) and utf4mb4
is a fixed 4 bytes. I presume Latin 1 to be a fixed length 1 byte - my question is is the mysql ascii type 7 bits and is there a slight storage savings over Latin-1?
Your initial premise is incorrect. In VARCHAR
as well as *TEXT
columns, utf8 and utf8mb4 characters only consume as many bytes of storage as are necessary for the utf8 representation of each character. The only fixed multibyte allocation is for CHAR
columns, since they are fixed width.
Tip: To save space with UTF-8, use VARCHAR instead of CHAR. Otherwise, MySQL must reserve three bytes for each character in a CHAR CHARACTER SET utf8 column because that is the maximum possible length. For example, MySQL must reserve 30 bytes for a CHAR(10) CHARACTER SET utf8 column.
http://dev.mysql.com/doc/refman/5.6/en/charset-unicode-utf8.html
Otherwise, when storing utf8 and utf8mb characters, the maximum bytes per character is not used for each character:
when using the utf8 (or utf8mb4) Unicode character set, you must keep in mind that not all characters use the same number of bytes and can require up to three (four) bytes per character. [emphasis added]
http://dev.mysql.com/doc/refman/5.6/en/storage-requirements.html
There is no 7 bit storage-byte-spanning character set.
However, compressing the entire table can be done in newer versions, and if you have sufficient CPU, the server can actually perform better with table compression enabled, if your disks are relatively slow or your server spends a lot of time being I/O-bound.
http://dev.mysql.com/doc/refman/5.6/en/innodb-compression.html