I need to save around 7.8 billion records in a Mysql table. The Table is both read and write intensive. I have to maintain at least 0.02 billion records per hour of insertion rate. While search on the table should not take more than 10 seconds. We have a UI from where users can search based on different colums property.
Mostly searched query can be like:
select * from mytable where prop1='sip:+100008521149' and
prop2='asdsa'
order by event_timestamp desc limit 10;
select * from mytable where prop1='sip:+100008521149'
order by event_timestamp desc limit 10;
select * from mytable where prop2='asdsa'
order by event_timestamp desc limit 10;
Currently there are 2 indexes on table:
1- idx_1(prop1,event_timestamp)
2- idx_2(prop2,event_timestamp)
InnoDB settings are as follows:
innodb_buffer_pool_size = 70G
innodb_log_file_size = 4G
innodb_io_capacity=2000
innodb_io_capacity_max=6000
innodb_lru_scan_depth=2000
innodb_flush_log_at_trx_commit=2
innodb_log_buffer_size=16M
innodb_thread_concurrency = 0
innodb_read_io_threads = 64
innodb_write_io_threads = 64
innodb_autoinc_lock_mode = 2
bulk_insert_buffer_size=33554432
query_cache_type=1
query_cache_size=64M
innodb_flush_neighbors=0
expire_logs_days=10
max-connections=500
long_query_time = 5
read_buffer_size=16M
sort_buffer_size=16M
read_rnd_buffer_size=16M
innodb_doublewrite = 0
innodb_flush_method=O_DIRECT
Machine's RAM size is 99 GB.
Once started system was fast but performance reduces a lot when record reached to 0.22 billion. Though we are using LOAD INFILE,insertion rate was very slow.Search was pretty fast while searching on indexed params. Looks like buffer pool is not enough.
I have few question:
Is this possible to support this kind of data with this config.
What should be the ideal and practical buffer pool size for 7 billion records.
UPADTE:1
Q- The table is much bigger than RAM, correct? The buffer_pool cannot be made big enough -- it must be smaller than ram, or else performance suffers.
A- RAM size is 100 GB, Buffer pool is 70 G. Yes the data size is too big than RAM.
Q- Please provide SHOW CREATE TABLE; there are several issues I need to investigate. (datatypes, field sizes, etc)
A- All the fields are string type. we have used varchar(127) for all. PK is autogenerated id bigint(20).
Q- How many records in the LOAD DATA INFILE? Do you LOAD directly into the table? How often is the LOAD?
A- 100000 records per file. Multiple threads are loading data from CSV file to DB. In the initial migration, we have to load it continuously till 0.65 billion records. After that frequency will reduce to around per 15 minutes.
Q- Master+Slave: Keep in mind that all writes are performed on the Slave, too. If you have lots of reads, then more than one Slave would spread the reads around, thereby getting some scaling.
A- We are testing currently with MASTER/SLAVE approach.
We made MASTER with MYISAM and no indexes. MASTER will be used for inserts. SLAVE having INNODB and with 2 indexes. The search will be performed on it. Both are different machines and not sharing RAM or CPU. The application is on the third machine.
Q- Do you have spinning drives? Or SSDs? A- How to check it?
Q- Your rows seem to be quite large. Are there TEXTs or BLOBs? If so, SELECT * may be a serious performance burden.
A- yes rows have 50 columns but data is in around 15-20 columns. We can't reduce the size of datatypes as all fields can hold any amount of alphanumeric data. All are TEXTS no BLOBS.
I achieved this requirement by replacing MYSQL DB with Elasticsearch. It looks a perfect fit for fast insertion rate and damn fast search. Moreover, full-text capabilities of Lucene make it a perfect tool. The best part of ES is that it has very low hardware requirements. It scales horizontally rather than vertically.