Search code examples
mysqlperformancejoinindexing

mysql create index on table with a 100 million rows


I have few MySQL tables-these have around 300 columns and 100 million rows. These store data for log files, hence the size. I am using InnoDB engine. Few queries involving joins of these tables obviously do not work. I tried adding indices to these, but the queries do not finish at all.

I wanted to know if there is any other way to speed up performance, or some way to make the 'create index' work on the tables?

Thank you.


Solution

  • Creating an index takes time, proportional to the number of rows in the table. 100 million rows is quite a lot for a MySQL table. It will probably take many hours to create an index on that table. Exactly how long varies, based on other factors including your server hardware, the data type of the columns you are creating the index for, other current load on the database, etc.

    One tool that can help you is pt-online-schema-change. It actually takes longer to build the index, but you can continue to read and write the original table while it's working. Test with a smaller table so you get some experience with using this tool.

    You can view a webinar about this tool here: Zero-Downtime Schema Changes in MySQL (free to view, but requires registration).

    Another technique is to create an empty table like your original, create the index in that table, and then start copying data from your original table into the new table gradually. If this is a log table, it's likely that you write to the table more than you read from the table, so you can probably swap the tables immediately and start logging new events immediately, and backfill it over time.

    A tool like pt-archiver can help you to copy data gradually without putting too much load on the server. Simply doing INSERT INTO... SELECT is not good for your database server's health if you try to copy 100 million rows in one transaction. It also puts locks on the original table. pt-archiver works by copying just a bite-sized chunk of rows at a time, so it avoids the high cost of such a large transaction.

    If you use an auto-increment primary key, take care to adjust the value to be higher than the max value in the original table before you let log events start writing to it, so you don't accidentally id values more than once.