Search code examples
mysqlinnodbinformation-schema

using information_schema to find size of innodb databases


We have about 60-70 databases on an RDS server, and a lot of them can be deleted.

I want to do a benchmark of size before and after, and they are all (to my knowledge) innoDB tables.

So, I'm using the information_schema table per this link: https://www.percona.com/blog/2008/03/17/researching-your-mysql-table-sizes/ and this is great, except the first query listed (and I presume the others) just runs and runs and eventually finishes after EIGHT MINUTES.

I can run this query instantly:

SELECT COUNT(*) FROM information_schema.TABLES;

And get about 12,500 tables.

I also notice - ironically enough - that information_schema.TABLES has no indexes! My instinct is not to mess with that.

My best option at this point is to dump the TABLES table, and run the query on a copy that I actually index.

My questions are: 1. how dynamic is the information_schema.TABLES table and in fact that entire database 2. why is it running so slow? 3. would it be advisable to index some key fields to optimize the queries I want to do? 4. If I do do an SQL dump, will I be getting current table size information?

Thanks, I hope this question is instructive.


Solution

  • information_schema is currently a thin layer on top of some older stuff. The older stuff needed to "open" each table to discover its size, etc. That involved reading at least the .frm. But it did not need to open in order to count the number of tables. Think of the difference between SHOW TABLES and SHOW TABLE STATUS.

    table_open_cache and table_definition_cache probably did have all the tables in them when you did the 8 minute query. Anyway, the values for those VARIABLES may have been less than 12,500, implying that there would have been churn.

    In the future (probably 5.8), all that info will probably be sitting in a single InnoDB table instead of splayed across the OS's file system. At that point, it will be quite fast. (Think of how fast a table scan of 12,500 rows can be done, especially if fully cached in RAM.)

    Since the information_schema does not have "real" tables, there is no way to add INDEXes.

    mysqldump does not provide the table size info. Even if it did, it would be no faster, since it would go through the same, old, mechanism.

    60 is a questionably large number of databases; 12K is a large number of tables. Often this implies a schema design that chooses to create multiple tables instead of putting data into a single table?