Search code examples
mysqllarge-databigdata

MySQL Large Datasets


I have large sets of data. Over 40GB that I loaded in MySQL table. I am trying to perform simple queries like select * from tablename but it takes gazillion minutes to run and eventually times out. If I set a limit, the execution is fairly fast ex: select * from tablename limit 1000. The table has over 200 million records.

  • Tried creating indexes on some columns and that failed too after 3 hours of execution.

Any tips on working with these types of datasets?


Solution

  • For large databases, one should consider using an alternative solutions such as Apache Spark. MySQL reads the data from disk which is a slow operation. Nothing can work as fast as a technology that is based on MapReduce. Take a look to this answer. It is true that with large databases, queries get very challenging.

    Anyway assuming you want to stick with MySQL, first of all if you are using MyISAM, make sure to convert your database storage to InnoDB. This is especially important if you have lots of read/write operations.

    It is also important to partition, that reduce the table into more manageable smaller tables. It will also enhance the indexes performance.

    Do not be too generous with adding indexes. Define indexes wisely. If an index does not need to be UNIQUE do not define it as one. If an index does not need to include multiple fields do not include multiple fields.

    Most importantly start monitor your MySQL instance. Use SHOW ENGINE INNODB STATUS to investigate the performance of your MySQL instance.