Search code examples
mysqlmyisamsqldatatypes

Space increased in MyISAM table after change column type from INT to SMALLINT and TINYINT


I'm trying to reduce the space occupied by my DB tables and optimize the performance of my system. I'm interested to optimize specially some MyISAM table with large amount of data. So I changed the column type as follow:
- from INT(11) to BIT(1) or TINYINT(3) or SMALLINT(6) or MEDIUMINT(9)
- from VARCHAR(...) to CHAR(x), with x the smallest number of characters useful

But the result was unexpected.
I ran the following query before and after my changes:

SELECT table_name AS "Table", round(((data_length + index_length) / 1024 / 1024), 2) "Size in MB" FROM information_schema.TABLES WHERE table_schema = "my_schema" AND table_name = "my_table";

Results:
- BEFORE 558 Mb
- AFTER 673,96 Mb

The space is increased. Why?

Edit:
The problem was CHAR conversion. In my case VARCHAR occupies less space because the fields are not of fixed dimension, so fixed CHAR fields need more space.
With VARCHAR my table occupies 444 Mb.


Solution

  • The INT changes should have helped.

    The VARCHAR changes should have hurt.

    Do not use CHAR unless you (1) the data is constant length, and (2) the CHARACTER SET is appropriate.

    VARCHAR is implemented as 1 or 2 bytes of length, plus only as many bytes as are needed for each string.

    CHAR(N) always occupies M*N bytes: M bytes/character depends on the charset: ascii/latin1: 1; utf8: 3, utf8mb4: 4.

    There are a very few cases where CHAR is actually better than VARCHAR. Here are some:

    country_code CHAR(2) CHARACTER SET ascii,
    md5 CHAR(32) CHARACTER SET ascii, -- packing into BINARY(16) would be tighter
    zip_code CHAR(5) CHARACTER SET ascii, -- MEDIUMINT(5) UNSIGNED ZEROFILL would be tighter
    uuid CHAR(36) CHARACTER SET ascii, -- Could be packed into BINARY(16)
    

    Hmmm... That's all I can think of at the moment.

    And, no, MyISAM does not benefit from "FIXED" instead of "DYNAMIC", except in rare cases. Since VARCHAR is smaller, there is less I/O; I/O is the main cost of dealing with data.

    Next: Forget about MyISAM and move up to InnoDB.