Search code examples
mysqlinnodb

innodb save storage space


I want to ask for mysql innodb what do you suggest for getting a smaller database?

I do not have innodb_file_per_table but I do not need to reclaim the space(this would have been nice to see how much space I was able to save).

Most of the strings are saved as utf8mb4. For some columns I could use latin1(I know that there can be there only latin characters). Do you think I could save a lot of space if I convert them to latin1? It seems that also this change should get you a performance increase in searches.

I have defined the columns that have a lot of text as MEDIUMTEXT . Do you think I should save some space if I define that column as TEXT? - from what I see this type acts like VARCHAR(it occupies only the length of the string)

What other suggestions do you have?

The database has more than 300 millions rows and is stored in about 100G

Thanks


Solution

  • Changing the character set will not help. When you use utf8 or utf8mb4, each character is stored in a variable number of bytes. Characters that can be stored in a single byte are stored that way.

    Changing from MEDIUMTEXT to TEXT will not help much. Each string in such columns is stored in a variable length, only up to the length needed by the string you stored. TEXT can store strings up to 64KB, MEDIUMTEXT can store strings up to 16MB. I suppose each such string may need a single length indicator, which would be two bytes per TEXT and three bytes per MEDIUMTEXT. So you might save at most 300MB per column over your entire database (and probably not even that much). This won't make enough of a difference.

    You can find out how much free space you have in your tablespace. Run SHOW TABLE STATUS LIKE 'sometable' where "sometable" is the name of any of the tables in your tablespace.

    One of the fields returned is data_free. This is the free space in bytes in the tablespace. When you have multiple tables together in the same global tablespace, every table reports the same value. This doesn't mean your free space is the sum total of all these figures, it's actually one free space that is repeated in each table status.

    To save space, some people declare InnoDB tales with ROW_FORMAT=COMPRESSED but this requires that you use file-per-table. Since you already have your tables in the global tablespace, even if you restructure the tables as file-per-table, it won't shrink the global tablespace. As you move tables into their own files, it'll just leave a giant global tablespace that is mostly empty. So that will just make your storage problem worse.

    The only thing I can suggest is that you do the following steps. No one can use your database while you're doing this.

    1. Dump all data from your InnoDB tables, saving the output of the dump to another volume if necessary. Wherever there's space. You can compress the output of the dump like:

      mysqldump ... | gzip -c > dump.sql.gz
      
    2. Stop your mysqld process.

    3. Remove your whole global tablespace, i.e. rm /var/lib/mysql/ibdata1, and any *.ibd files you might still have. You should take a backup first, of course.
    4. Enable innodb_file_per_table in your /etc/my.cnf. Also enable innodb_file_format=Barracuda depending on the version of MySQL.
    5. Start your mysqld process. It'll automatically recreate the global tablespace as a new, small file.
    6. Restore the tables you dumped. They'll be put into individual InnoDB files, not the global tablespace.
    7. If necessary, ALTER each table to use ROW_FORMAT=COMPRESSED.

    This obviously takes a long time to dump and reload 300M rows. It will take many hours, and your database will not be usable during that time.

    If you can't make your database unavailable while you do this procedure, you'll have to do it on a replica, and when the procedure is done and the replica is in sync with the master, then you can quickly substitute the replica for the master. That will still cause a brief interruption in service while you make the switch, but it will be quick.

    Next time start with a larger storage volume for your database server. Project the amount of storage you'll need and plan for it.


    Re your comment, that you changed from MEDIUMTEXT to TEXT and saved space.

    The sizes in INFORMATION_SCHEMA (which are the same as those reported by SHOW TABLE STATUS) are only estimates, and they can be out of date or otherwise way off. Running ANALYZE TABLE once in a while is a way to update the statistics.

    A table can also be fragmented, and rebuilding it once in a while can reclaim some of that space. Use OPTIMIZE TABLE.

    Another possibility is that your MEDIUMTEXT columns were actually storing longer text strings than could fit in a TEXT column, and your ALTER TABLE truncated them.

    Here's a demonstration:

    mysql> create table m ( m mediumtext);
    
    mysql> insert into m set m = repeat('X', 1024*1024*2);
    Query OK, 1 row affected (0.05 sec)
    
    mysql> select length(m) from m;
    +-----------+
    | length(m) |
    +-----------+
    |   2097152 |
    +-----------+
    
    mysql> alter table m modify column m text;
    Query OK, 1 row affected (0.01 sec)
    Records: 1  Duplicates: 0  Warnings: 0
    
    mysql> select length(m) from m;
    +-----------+
    | length(m) |
    +-----------+
    |         0 |
    +-----------+
    

    I filled a MEDIUMTEXT with 2MB of data, then used ALTER to change the column to TEXT. It didn't simply truncate to the 64KB that could fit in a TEXT column, it truncated the text down to zero characters.

    So I hope you didn't just wipe out all your text data.