Search code examples
mysqlsqlinnodbsql-optimizationamazon-aurora

Select Query is very slow on MySQL InnoDB Table having 160M+ rows


Below are my table structure and index I have created. This table is having 160+ Million rows.

create table test
(
client_id varchar(100),
user_id varchar(100),
ad_id varchar(100),
attr0 varchar(250),
scp_id varchar(250),
attr1 datetime null default null,
attr2 datetime null default null,
attr3 datetime null default null,
attr4 datetime null default null,
sent_date date null default null,
channel varchar(100)
)ENGINE=InnoDB;

CREATE INDEX idx_test_cid_sd ON test (client_id,sent_date);
CREATE INDEX idx_test_uid ON test (user_id);
CREATE INDEX idx_test_aid ON test (ad_id);

Below is the queries I am running:

select 
 count(distinct user_id) as users
 count(distinct ad_id) as ads
, count(attr1) as attr1
, count(attr2) as attr2
, count(attr3) as attr3
, count(attr4) as attr4
from test
where client_id = 'abcxyz'
and sent_date >= '2017-01-01' and sent_date < '2017-02-01';

Issues: Above query taking a lot of time more than 1 hour to return the result. When I saw the explain plan, it is using indexing and scanning only 8 millions record but the weird problem is it is taking more than 1 hour to return the results.

Can anyone tell me what is going wrong here or any suggestions on optimization part?


Solution

    • Shrink the table to decrease the need for I/O. This includes normalizing (where practical). Using AUTO_INCREMENT of reasonable size for the various ids - instead of VARCHAR. If you could explain those varchars, I could assess whether this is practical and how much benefit you might get.

    • Have a PRIMARY KEY. InnoDB does not like not having one. (This will not help the particular problem. If some combination of columns is UNIQUE, then make that the PK. If not, use id INT UNSIGNED AUTO_INCREMENT; it won't run out of ids until after 4 billion.

    • Change the PRIMARY KEY to make the query run faster. (Though perhaps not faster than Simulant's "covering" index.) But it would be less bulky:

    Assuming you add id .. AUTO_INCREMENT, then:

    PRIMARY KEY(client_id, sent_date, id),
    INDEX(id)
    

    How big (GB) is the data? The indexes? You may be at the cusp of "too big to cache", and paying for more RAM may help.

    • Summary tables are great for COUNT, but not great for COUNT(DISTINCT ...). That is, the counts could be done in seconds. For Uniques, see my blog. Alas it is rather sketchy; ask for help. It provides rolling up COUNT(DISTINCT...) as efficiently as COUNT, but with a 1-2% error.

    The gist of a Summary table: PRIMARY KEY(client_id, day) with columns for each day's counts. Then getting the values for a month is SUMming the counts for 31 days. Very fast. More on Summary Tables.