Search code examples
mysqlruby-on-railsjoinmysql-slow-query-log

Slow Query on Rails joins


The following rails query throws back in slow query log:

Class ParserRun

scope :active, -> {
where(completed_at: nil)
  .joins('LEFT JOIN system_events ON parser_runs.id = system_events.parser_run_id')
  .where("system_events.created_at > '#{active_system_events_threshold}' OR parser_runs.created_at > '#{1.minute.ago.to_s(:db)}'")
}

How can I optimize this?

Slow querylog:

SELECT `parser_runs`.*
FROM `parser_runs`
INNER JOIN `system_events` ON `system_events`.`parser_run_id` = `parser_runs`.`id`
WHERE `parser_runs`.`type` IN ('DatasetParserRun')
  AND `parser_runs`.`completed_at` IS NULL
  AND (system_events.created_at <= '2017-08-05 04:03:09');

# Time: 170805  5:03:43

Output of 'show create table parser_runs;'

| parser_runs | CREATE TABLE `parser_runs` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`type` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`customer_id` int(11) DEFAULT NULL,
`options` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`completed_at` datetime DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_parser_runs_on_customer_id` (`customer_id`)
) ENGINE=InnoDB AUTO_INCREMENT=143327 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci |

output of 'show create table system_events;'

 | system_events | CREATE TABLE `system_events` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`log_level` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`customer_id` int(11) DEFAULT NULL,
`classification` int(11) DEFAULT NULL,
`information` text COLLATE utf8_unicode_ci,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`parser_run_id` int(11) DEFAULT NULL,
`notified` tinyint(1) DEFAULT '0',
`dataset_log_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_system_events_on_classification` (`classification`),
KEY `index_system_events_on_customer_id` (`customer_id`),
KEY `index_system_events_on_parser_run_id` (`parser_run_id`),
KEY `index_system_events_on_dataset_log_id` (`dataset_log_id`)
) ENGINE=InnoDB AUTO_INCREMENT=730539 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci |

Output of EXPLAIN:

  EXPLAIN for: SELECT `parser_runs`.* FROM `parser_runs` LEFT JOIN system_events ON parser_runs.id = system_events.parser_run_id WHERE `parser_runs`.`completed_at` IS NULL AND (system_events.created_at > '2017-08-07 10:09:03')
+----+-------------+---------------+--------+-------------------------   -------------+---------+---------+--------------------------------------+-    -------+-------------+
| id | select_type | table         | type   | possible_keys                          | key     | key_len | ref                                  | rows   |    Extra       |
+----+-------------+---------------+--------+--------------------------------------+---------+---------+--------------------------------------+--------+-------------+
|  1 | SIMPLE      | system_events | ALL    | index_system_events_on_parser_run_id | NULL    | NULL    | NULL                                    | 655946 | Using where |
|  1 | SIMPLE      | parser_runs   | eq_ref | PRIMARY                                | PRIMARY | 4       | ashblood.system_events.parser_run_id |      1 | Using where |
+----+-------------+---------------+--------+--------------------------------------+---------+---------+--------------------------------------+--------+-------------+

2 rows in set (0.00 sec)


Solution

  • The first step in the query execution plan (the output of EXPLAIN SELECT ...) indicates that the whole system_events table is being scanned in order to check which rows in the system_events table will be used in the join with the parser_runs table.

    Please, add an index on the created_at column in the system_events and repeat the query. Please, check the new execution path to verify whether the whole table is being scanned, or if the new index is being used.

    In addition, although probably not the root of the problem, you could add an index on the type and completed_at columns of the table parser_runs. Please, note that I mean an index on both columns (in the given order) instead of an index on each column.