I have what seems to be a fairly straightforward query, but it is super slow and I would like to improve its performance if I can.
SELECT `contacts`.`unit_id`, `contacts`.`owner_id`, `units`.`description`,
`units`.`address`, `owners`.`name`, `owners`.`email`, COUNT(*) AS contact_count
FROM `contacts`
LEFT JOIN `units` ON `contacts`.`unit_id` = `units`.`id`
LEFT JOIN `owners` ON `contacts`.`owner_id` = `owners`.`id`
WHERE `owners.group_id` = 6
AND `contacts`.`checkin` BETWEEN '2021-10-01 00:00:00' AND '2021-10-31 23:59:59'
GROUP BY `units`.`id`
ORDER BY `contact_count` DESC
LIMIT 20;
I'm just trying to get the units with the most contacts in a given date range, and belonging to a certain group of owners.
+------+-------------+----------+--------+--------------------------------------------------+---------------------------+---------+-------------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+----------+--------+--------------------------------------------------+---------------------------+---------+-------------------------+------+---------------------------------+
| 1 | SIMPLE | owners | ref | PRIMARY,owners_group_id_foreign | owners_group_id_foreign | 4 | const | 1133 | Using temporary; Using filesort |
| 1 | SIMPLE | contacts | ref | contacts_checkin_index,contacts_owner_id_foreign | contacts_owner_id_foreign | 4 | appdb.owners.id | 1145 | Using where |
| 1 | SIMPLE | units | eq_ref | PRIMARY | PRIMARY | 4 | appdb.contacts.unit_id | 1 | |
+------+-------------+----------+--------+--------------------------------------------------+---------------------------+---------+-------------------------+------+---------------------------------+
As near as I can tell, everything that should be indexed is:
CREATE TABLE `contacts` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`owner_id` int(10) unsigned NOT NULL,
`unit_id` int(10) unsigned NOT NULL,
`terminal_id` int(10) unsigned NOT NULL,
`checkin` datetime NOT NULL
PRIMARY KEY (`id`),
KEY `contacts_checkin_index` (`checkin`),
KEY `contacts_unit_id_foreign` (`unit_id`),
KEY `contacts_terminal_id_foreign` (`terminal_id`),
KEY `contacts_owner_id_foreign` (`owner_id`),
CONSTRAINT `contacts_unit_id_foreign` FOREIGN KEY (`unit_id`) REFERENCES `units` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `contacts_terminal_id_foreign` FOREIGN KEY (`terminal_id`) REFERENCES `terminals` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `contacts_owner_id_foreign` FOREIGN KEY (`owner_id`) REFERENCES `owners` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=25528530 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
The contacts
table currently has about 10 million rows, and this query takes about 4 minutes to run. Is this anything that can be improved significantly or am I just bumping up against limitations of my hardware at this point?
SELECT sub.unit_id, sub.owner_id, u.`description`, u.`address`,
sub.name, sub.email,
sub.contact_count
FROM
( SELECT c.`unit_id`, c.`owner_id`,
o.`name`, o.`email`,
COUNT(*) AS contact_count
FROM `contacts` AS c
JOIN `owners` AS o ON c.`owner_id` = o.`id`
WHERE o.`group_id` = 6
AND c.`checkin` >= '2021-10-01'
AND c.`checkin` < '2021-10-01' + INTERVAL 1 MONTH
GROUP BY c.`unit_id`
ORDER BY `contact_count` DESC
LIMIT 20
) AS sub
LEFT JOIN `units` AS u ON sub.`unit_id` = u.`id`
ORDER BY `contact_count` DESC, sub.unit_id DESC;
Notes:
units
only 20 times.JOIN owners
cannot be LEFT JOIN
, so I changed that.GROUP BY
to avoid using units
prematurely.GROUP BY
is now redundant.ORDER BY
to make it deterministic in case of dup counts.Indexes that may help:
contacts: INDEX(checkin, unit_id, owner_id)
contacts: INDEX(owner_id, checkin, unit_id)
owners: INDEX(group_id, id, name, email)
When adding those, remove any INDEXes
that start with the same columns. Example: contacts: INDEX(checkin)