Below are my table structure and index I have created. This table is having 160+ Million rows.
create table test
(
client_id varchar(100),
user_id varchar(100),
ad_id varchar(100),
attr0 varchar(250),
scp_id varchar(250),
attr1 datetime null default null,
attr2 datetime null default null,
attr3 datetime null default null,
attr4 datetime null default null,
sent_date date null default null,
channel varchar(100)
)ENGINE=InnoDB;
CREATE INDEX idx_test_cid_sd ON test (client_id,sent_date);
CREATE INDEX idx_test_uid ON test (user_id);
CREATE INDEX idx_test_aid ON test (ad_id);
Below is the queries I am running:
select
count(distinct user_id) as users
count(distinct ad_id) as ads
, count(attr1) as attr1
, count(attr2) as attr2
, count(attr3) as attr3
, count(attr4) as attr4
from test
where client_id = 'abcxyz'
and sent_date >= '2017-01-01' and sent_date < '2017-02-01';
Issues: Above query taking a lot of time more than 1 hour to return the result. When I saw the explain plan, it is using indexing and scanning only 8 millions record but the weird problem is it is taking more than 1 hour to return the results.
Can anyone tell me what is going wrong here or any suggestions on optimization part?
Shrink the table to decrease the need for I/O. This includes normalizing (where practical). Using AUTO_INCREMENT
of reasonable size for the various ids - instead of VARCHAR
. If you could explain those varchars, I could assess whether this is practical and how much benefit you might get.
Have a PRIMARY KEY
. InnoDB does not like not having one. (This will not help the particular problem. If some combination of columns is UNIQUE
, then make that the PK. If not, use id INT UNSIGNED AUTO_INCREMENT
; it won't run out of ids until after 4 billion.
Change the PRIMARY KEY
to make the query run faster. (Though perhaps not faster than Simulant's "covering" index.) But it would be less bulky:
Assuming you add id .. AUTO_INCREMENT
, then:
PRIMARY KEY(client_id, sent_date, id),
INDEX(id)
How big (GB) is the data? The indexes? You may be at the cusp of "too big to cache", and paying for more RAM may help.
COUNT
, but not great for COUNT(DISTINCT ...)
. That is, the counts could be done in seconds. For Uniques, see my blog. Alas it is rather sketchy; ask for help. It provides rolling up COUNT(DISTINCT...)
as efficiently as COUNT
, but with a 1-2% error.The gist of a Summary table: PRIMARY KEY(client_id, day)
with columns for each day's counts. Then getting the values for a month is SUMming
the counts for 31 days. Very fast. More on Summary Tables.