mysql sql linux wikipedia database-performance

MYSQL Insert Huge SQL Files of GB in Size

I'm trying to create a Wikipedia DB copy (Around 50GB), but having problems with the largest SQL files.

I've split the files of size in GB using linux split utility into chunks of 300 MB. e.g.

split -d -l 50 ../enwiki-20070908-page page.input.

On average 300MB files take 3 hours at my server. I've ubuntu 12.04 server OS and Mysql 5.5 Server.

I'm trying like following:

mysql -u username -ppassword database < category.sql

Note: these files consist of Insert statements and these are not CSV files.

Wikipedia offers database dumps for download, so everybody can create a copy of Wikipedia. You can find example files here: Wikipedia Dumps

I think the import is slow because of the settings for my MySQL Server, but I don't know what I should change. I'm using the standard Ubuntu MySQL config on a machine with a decent processor and 2GB RAM. Could someone help me out with a suitable configuration for my system?

I've tried to set innodb_buffer_pool_size to 1GB but no vains.

Solution

Since you have less than 50GB of memory (so you can't buffer the entire database in memory), the bottleneck is the write speed of your disk subsystem.

Tricks to speed up imports:

MyISAM is not transactional, so much faster in single threaded inserts. Try to load into MyISAM, then ALTER the table to INNODB
- Use ALTER TABLE .. DISABLE KEYS to avoid index updates line by line (MyISAM only)
- Set bulk_insert_buffer_size above your insert size (MyISAM only)
- Set unique_checks = 0 so that unique constrains are not checked.

For more, see Bulk Data Loading for InnoDB Tables in MySQL Manual.

Note: If the original table have foreign key constraints, using MyISAM as an intermediate format is a bad idea.