I have a table like this:
CREATE TABLE `purchase` (
`fact_purchase_id` binary(16) NOT NULL,
`purchase_id` int(10) unsigned NOT NULL,
`purchase_id_primary` int(10) unsigned DEFAULT NULL,
`person_id` int(10) unsigned NOT NULL,
`person_id_owner` int(10) unsigned NOT NULL,
`service_id` int(10) unsigned NOT NULL,
`fact_count` int(10) unsigned NOT NULL DEFAULT '0',
`fact_type` tinyint(3) unsigned NOT NULL,
`date_fact` date NOT NULL,
`purchase_name` varchar(255) DEFAULT NULL,
`activation_price` decimal(7,2) unsigned NOT NULL DEFAULT '0.00',
`activation_price_total` decimal(7,2) unsigned NOT NULL DEFAULT '0.00',
`renew_price` decimal(7,2) unsigned DEFAULT '0.00',
`renew_price_total` decimal(7,2) unsigned NOT NULL DEFAULT '0.00',
`activation_cost` decimal(7,2) unsigned DEFAULT '0.00',
`activation_cost_total` decimal(7,2) unsigned NOT NULL DEFAULT '0.00',
`renew_cost` decimal(7,2) unsigned DEFAULT '0.00',
`renew_cost_total` decimal(7,2) unsigned NOT NULL DEFAULT '0.00',
`date_created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`fact_purchase_id`),
KEY `purchase_id_idx` (`purchase_id`),
KEY `person_id_idx` (`person_id`),
KEY `person_id_owner_idx` (`person_id_owner`),
KEY `service_id_idx` (`service_id`),
KEY `fact_type_idx` (`fact_type`),
KEY `renew_price_idx` (`renew_price`),
KEY `renew_cost_idx` (`renew_cost`),
KEY `renew_price_year_idx` (`renew_price_year`),
KEY `renew_cost_year_idx` (`renew_cost_year`),
KEY `date_created_idx` (`date_created`),
KEY `purchase_id_primary_idx` (`purchase_id_primary`),
KEY `fact_count` (`fact_count`),
KEY `renew_price_year_total_idx` (`renew_price_total`),
KEY `renew_cost_year_total_idx` (`renew_cost_total`),
KEY `date_fact` (`date_fact`) USING BTREE,
CONSTRAINT `purchase_person_fk` FOREIGN KEY (`person_id`) REFERENCES `person` (`person_id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `purchase_person_owner_fk` FOREIGN KEY (`person_id_owner`) REFERENCES `person` (`person_id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `purchase_service_fk` FOREIGN KEY (`service_id`) REFERENCES `service` (`service_id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
I'm launching this query:
SELECT
purchase.date_fact,
UNIX_TIMESTAMP(purchase.date_fact),
COUNT(DISTINCT purchase.purchase_id) AS Num
FROM
purchase
WHERE
purchase.date_fact >= '2017-01-01'
AND purchase.date_fact <= '2017-01-31'
AND purchase.fact_type = 3
AND purchase.purchase_id_primary IS NULL
GROUP BY purchase.date_fact
The table contains a total of 5.629.670 records and running an EXPLAIN
on the query I get these results:
rows
= 2.814.835possible_keys
= fact_type_idx,purchase_id_primary_idx,date_fact
key
= fact_type_idx
key_len
= 1ref
= const
filtered
= 25.00Extra
= Using index condition;Using where;Using filesort
The query takes 30-35 seconds to be executed. This is too long to wait.
The problem is that the GROUP BY
causes filesort to be applied. Applying ORDER BY NULL
to the query doesn't change anything.
I could possibly use a covering index, but I just need date_fact in this query: which fields can I use?
How can I avoid filesort on GROUP BY
? How can I optimize the query in order to make it faster?
I'm using this table for statistics purposes (OLAP). Maybe is there any better DBMS for this purpose?
I'm running MySql Server 5.7.17.
Thank you
For this query:
SELECT p.date_fact, UNIX_TIMESTAMP(p.date_fact),
COUNT(DISTINCT p.purchase_id) AS Num
FROM purchase p
WHERE p.date_fact >= '2017-01-01' AND
p.date_fact <= '2017-01-31' AND
p.fact_type = 3 AND
p.purchase_id_primary IS NULL
GROUP BY p.date_fact;
I would recommend a compound index on (fact_type, purchase_id_primary, date_fact, purchase_id)
. The first two keys have equality conditions in the WHERE
. The third has an inequality, and the fourth allows the index to "cover" the query (all columns in the query are in the index).
I would also add: if you don't need COUNT(DISTINCT)
, then don't use it. purchase_id
might already be unique in purchase
.