Search code examples
mysqlsqlmariadbquery-optimization

Improving slow MariaDB query performance


I have what seems to be a fairly straightforward query, but it is super slow and I would like to improve its performance if I can.

SELECT `contacts`.`unit_id`, `contacts`.`owner_id`, `units`.`description`, 
  `units`.`address`, `owners`.`name`, `owners`.`email`, COUNT(*) AS contact_count
FROM `contacts`
LEFT JOIN `units` ON `contacts`.`unit_id` = `units`.`id`
LEFT JOIN `owners` ON `contacts`.`owner_id` = `owners`.`id`
WHERE `owners.group_id` = 6
  AND `contacts`.`checkin` BETWEEN '2021-10-01 00:00:00' AND '2021-10-31 23:59:59'
GROUP BY `units`.`id`
ORDER BY `contact_count` DESC
LIMIT 20;

I'm just trying to get the units with the most contacts in a given date range, and belonging to a certain group of owners.

+------+-------------+----------+--------+--------------------------------------------------+---------------------------+---------+-------------------------+------+---------------------------------+
| id   | select_type | table    | type   | possible_keys                                    | key                       | key_len | ref                     | rows | Extra                           |
+------+-------------+----------+--------+--------------------------------------------------+---------------------------+---------+-------------------------+------+---------------------------------+
|    1 | SIMPLE      | owners   | ref    | PRIMARY,owners_group_id_foreign                  | owners_group_id_foreign   | 4       | const                   | 1133 | Using temporary; Using filesort |
|    1 | SIMPLE      | contacts | ref    | contacts_checkin_index,contacts_owner_id_foreign | contacts_owner_id_foreign | 4       | appdb.owners.id         | 1145 | Using where                     |
|    1 | SIMPLE      | units    | eq_ref | PRIMARY                                          | PRIMARY                   | 4       | appdb.contacts.unit_id  |    1 |                                 |
+------+-------------+----------+--------+--------------------------------------------------+---------------------------+---------+-------------------------+------+---------------------------------+

As near as I can tell, everything that should be indexed is:

CREATE TABLE `contacts` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `owner_id` int(10) unsigned NOT NULL,
  `unit_id` int(10) unsigned NOT NULL,
  `terminal_id` int(10) unsigned NOT NULL,
  `checkin` datetime NOT NULL
  PRIMARY KEY (`id`),
  KEY `contacts_checkin_index` (`checkin`),
  KEY `contacts_unit_id_foreign` (`unit_id`),
  KEY `contacts_terminal_id_foreign` (`terminal_id`),
  KEY `contacts_owner_id_foreign` (`owner_id`),
  CONSTRAINT `contacts_unit_id_foreign` FOREIGN KEY (`unit_id`) REFERENCES `units` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
  CONSTRAINT `contacts_terminal_id_foreign` FOREIGN KEY (`terminal_id`) REFERENCES `terminals` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
  CONSTRAINT `contacts_owner_id_foreign` FOREIGN KEY (`owner_id`) REFERENCES `owners` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=25528530 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

The contacts table currently has about 10 million rows, and this query takes about 4 minutes to run. Is this anything that can be improved significantly or am I just bumping up against limitations of my hardware at this point?


Solution

  • SELECT  sub.unit_id, sub.owner_id, u.`description`, u.`address`,
            sub.name, sub.email,
            sub.contact_count
        FROM  
            ( SELECT  c.`unit_id`, c.`owner_id`,
                      o.`name`, o.`email`,
                      COUNT(*) AS contact_count
                FROM  `contacts` AS c
                JOIN  `owners` AS o  ON c.`owner_id` = o.`id`
                WHERE  o.`group_id` = 6
                  AND  c.`checkin` >= '2021-10-01'
                  AND  c.`checkin` <  '2021-10-01' + INTERVAL 1 MONTH
                GROUP BY  c.`unit_id`
                ORDER BY  `contact_count` DESC
                LIMIT  20 
            ) AS sub
        LEFT JOIN  `units` AS u  ON sub.`unit_id` = u.`id`
        ORDER BY  `contact_count` DESC, sub.unit_id DESC;
    

    Notes:

    • I turned it inside out in order to hit units only 20 times.
    • JOIN owners cannot be LEFT JOIN, so I changed that.
    • I changed the GROUP BY to avoid using units prematurely.
    • Possibly the GROUP BY is now redundant.
    • I changed the date range to make it easier to be generic.
    • I augmented the ORDER BY to make it deterministic in case of dup counts.
    • Notice, below, how "composite" indexes can be helpful.

    Indexes that may help:

    contacts:  INDEX(checkin, unit_id, owner_id)
    contacts:  INDEX(owner_id, checkin, unit_id)
    owners:  INDEX(group_id, id,  name, email)
    

    When adding those, remove any INDEXes that start with the same columns. Example: contacts: INDEX(checkin)