Below table contains 10 million of rows,
CREATE TABLE Sample1 (
c1 bigint(20) NOT NULL AUTO_INCREMENT,
c2 varchar(45) NOT NULL,
c3 tinyint(4) NOT NULL DEFAULT 0,
c4 tinyint(4) NOT NULL DEFAULT 0,
c5 varchar(45) DEFAULT NULL,
time bigint(20) DEFAULT NULL,
PRIMARY KEY (c1),
KEY varchar_time_idx (c2,Time),
KEY varchar_c3_time_idx (c2,c3,Time),
KEY varchar_c4_time_idx (c2,c4,Time),
KEY varchar_c3_c4_time_idx (c2,c3, c4,Time)
) ENGINE=InnoDB AUTO_INCREMENT=10093495 DEFAULT CHARSET=utf8;
Select
Four multi column index created to select rows with below conditions in where
1) c2 and time
ex: select c1, c5 from Sample1 where c2 = 'sometext' order by time limit 30;
2) c2 and c3 and time
ex: select c1, c5 from Sample1 where c2 = 'sometext' and c3 = int order by time limit 30;
3) c2 and c4 and time
ex: select c1, c5 from Sample1 where c2 = 'sometext' and c4 = int order by time limit 30;
4) c2 and c3 and c4 and time
ex: select c1, c5 from Sample1 where c2 = 'sometext' and c3 = int and c4 = int order by time limit 30;
To make above select faster, created four multi column index.
Cardinality wise c2, c3 and c4 are very low. (ex: Out of one million c2, c3 and c4 have 100 unique column in each).
Also not equally distributed. Each group in c2 have uneven number of rows. (ex: c2 = 1 contains 100000, c2 = 2 contains 1500000 and so on)
Column time(timestamp in millisecond) contains mostly unique fields.
Select happen normally(10 to 30 times in a hour but it should be in high speed)
Insert
Insert happen very frequently.
But it processes in Sequently (one after another).
Update
All update based on C1 (Primary Key). (Frequency Level : 20% on Insert)
update Sample1 set c3 = INT, c4 = INT, time = CurrentTimeInMilliSecond where c1 = INT
Tables has 5 indexing fields(4 multi column). Due to this
1) Insert and update on index fields become costlier
2) As the table keep on growing (it may reach upto 100 million), Index size also increase more rapidly
Kindly suggest good approach in mysql to solve this use case.
Other Necessary Details
innodb_buffer_pool_size:16106127360(15 GB);
CPU Core:32;
RAM:32GB
Caution: TMI coming. I'm having to do some guessing; I can be more specific if you supply more details...
The 4 secondary keys you have are optimal for the 4 queries you listed.
Cardinality, contrary to a popular wives' tale, has nothing to do with composite indexes and SELECT
performance.
At 100M rows, the table (including indexes) will perhaps be 20GB. How much RAM do you have? What is the value of innodb_buffer_pool_size
? Unless you have a tiny RAM, these probably won't matter.
Back to 'cardinality'.
Let's look at INDEX(c2, Time)
where there are 100 distinct values for c2
and Time
is essentially ever-increasing. Each new INSERT
will put the new row in one of 100 spots -- the ends of each c2 clump. This implies 100 "hot spots", and it implies 100 blocks is (mostly) sufficient to take care of updating this one index. 100 blocks = 1.6MB of the buffer_pool -- hopefully a tiny fraction.
Meanwhile, the PRIMARY KEY
is AUTO_INCREMENT
, so there is one hot spot, and one block -- even tinier fraction.
But... The other 3 secondary keys will have more hot spots (blocks), so they could be more important. Let's go to the worst one (c2, c3, c4, Time)
. Tentatively, that would have 100*100*100 hot spots. But I think that will be more than there will be blocks in the entire index. (So, the math falls apart.) So that will be rather busy.
A digression for a moment... How many rows do you INSERT
in a transaction? How many rows/second? What is the value of innodb_flush_log_at_trx_commit
(flatc)? Well, let's simplify it down to either one row fully flushed at a time versus lots of rows flushed in a batch.
Back to the computations...
At one extreme: Small buffer_pool and single-row transactions and flatc=1 and HDD: you will need a few IOPs. I hope you don't need to insert more than 20 rows/second.
At the other extreme: Large buffer pool and batching and flatc=2 and SSD: Average of less than 1 IOPs. You can probably handle more than 1000 rows being inserted per second.
Normalizing c2
might cut in half the 20GB estimate, thereby making multiple tweaks in the computations.
Back to the SELECTs
-- are you really fetching 100K rows for a given c2
? If you have more filtering, ORDERing
, LIMITing
, etc, please show them; it could make a big difference in this analysis.
Back to the title -- I don't yet see any useful way to change/minimize those indexes. They seem to be very useful for the SELECTs
, and only minimally harmful to the INSERTs
.
Oh, the UPDATEs
. We need to see the WHERE
clause on the UPDATEs
before consider the ramifications there.
More (After several updates to question)
PRIMARY KEY(c1)
takes care of making the UPDATEs
as fast as possible (aside from the need to eventually update the indexes).
SELECTs
are very infrequent; my indexes make each run as fast as 'possible'
Buffer_pool of 15GB says that the entire table and all its indexes will live in the pool (once it is warmed up) -- for the current 10M rows. At 100M rows, it may still be OK. I say this because the queries that are likely to cause churn are the SELECTs
, but they all say AND Time > ...
. This implies a "working set" that is the "end" of the table. If you get to a billion rows, this paragraph needs revisiting.
MySQL should be able to handle a million INSERTs
per day, even with the worst settings. So if you are not expecting to get your 100M rows faster than 3 months, I don't think the INSERTs
are a problem.