Search code examples
mysqlsqlquery-optimizationamazon-rdssql-optimization

How to optimise this MySQL query? Millions of Rows


I have the following query:

SELECT 
    analytics.source AS referrer, 
    COUNT(analytics.id) AS frequency, 
    SUM(IF(transactions.status = 'COMPLETED', 1, 0)) AS sales
FROM analytics
LEFT JOIN transactions ON analytics.id = transactions.analytics
WHERE analytics.user_id = 52094 
GROUP BY analytics.source 
ORDER BY frequency DESC 
LIMIT 10 

The analytics table has 60M rows and the transactions table has 3M rows.

When I run an EXPLAIN on this query, I get:

+------+--------------+-----------------+--------+---------------------+-------------------+----------------------+---------------------------+----------+-----------+-------------------------------------------------+
| # id |  select_type |      table      |  type  |    possible_keys    |        key        |        key_len       |            ref            |   rows   |   Extra   |                                                 |
+------+--------------+-----------------+--------+---------------------+-------------------+----------------------+---------------------------+----------+-----------+-------------------------------------------------+
| '1'  |  'SIMPLE'    |  'analytics'    |  'ref' |  'analytics_user_id | analytics_source' |  'analytics_user_id' |  '5'                      |  'const' |  '337662' |  'Using where; Using temporary; Using filesort' |
| '1'  |  'SIMPLE'    |  'transactions' |  'ref' |  'tran_analytics'   |  'tran_analytics' |  '5'                 |  'dijishop2.analytics.id' |  '1'     |  NULL     |                                                 |
+------+--------------+-----------------+--------+---------------------+-------------------+----------------------+---------------------------+----------+-----------+-------------------------------------------------+

I can't figure out how to optimise this query as it's already very basic. It takes around 70 seconds to run this query.

Here are the indexes that exist:

+-------------+-------------+----------------------------+---------------+------------------+------------+--------------+-----------+---------+--------+-------------+----------+----------------+
|   # Table   |  Non_unique |          Key_name          |  Seq_in_index |    Column_name   |  Collation |  Cardinality |  Sub_part |  Packed |  Null  |  Index_type |  Comment |  Index_comment |
+-------------+-------------+----------------------------+---------------+------------------+------------+--------------+-----------+---------+--------+-------------+----------+----------------+
| 'analytics' |  '0'        |  'PRIMARY'                 |  '1'          |  'id'            |  'A'       |  '56934235'  |  NULL     |  NULL   |  ''    |  'BTREE'    |  ''      |  ''            |
| 'analytics' |  '1'        |  'analytics_user_id'       |  '1'          |  'user_id'       |  'A'       |  '130583'    |  NULL     |  NULL   |  'YES' |  'BTREE'    |  ''      |  ''            |
| 'analytics' |  '1'        |  'analytics_product_id'    |  '1'          |  'product_id'    |  'A'       |  '490812'    |  NULL     |  NULL   |  'YES' |  'BTREE'    |  ''      |  ''            |
| 'analytics' |  '1'        |  'analytics_affil_user_id' |  '1'          |  'affil_user_id' |  'A'       |  '55222'     |  NULL     |  NULL   |  'YES' |  'BTREE'    |  ''      |  ''            |
| 'analytics' |  '1'        |  'analytics_source'        |  '1'          |  'source'        |  'A'       |  '24604'     |  NULL     |  NULL   |  'YES' |  'BTREE'    |  ''      |  ''            |
| 'analytics' |  '1'        |  'analytics_country_name'  |  '1'          |  'country_name'  |  'A'       |  '39510'     |  NULL     |  NULL   |  'YES' |  'BTREE'    |  ''      |  ''            |
| 'analytics' |  '1'        |  'analytics_gordon'        |  '1'          |  'id'            |  'A'       |  '56934235'  |  NULL     |  NULL   |  ''    |  'BTREE'    |  ''      |  ''            |
| 'analytics' |  '1'        |  'analytics_gordon'        |  '2'          |  'user_id'       |  'A'       |  '56934235'  |  NULL     |  NULL   |  'YES' |  'BTREE'    |  ''      |  ''            |
| 'analytics' |  '1'        |  'analytics_gordon'        |  '3'          |  'source'        |  'A'       |  '56934235'  |  NULL     |  NULL   |  'YES' |  'BTREE'    |  ''      |  ''            |
+-------------+-------------+----------------------------+---------------+------------------+------------+--------------+-----------+---------+--------+-------------+----------+----------------+


+----------------+-------------+-------------------+---------------+-------------------+------------+--------------+-----------+---------+--------+-------------+----------+----------------+
|    # Table     |  Non_unique |      Key_name     |  Seq_in_index |    Column_name    |  Collation |  Cardinality |  Sub_part |  Packed |  Null  |  Index_type |  Comment |  Index_comment |
+----------------+-------------+-------------------+---------------+-------------------+------------+--------------+-----------+---------+--------+-------------+----------+----------------+
| 'transactions' |  '0'        |  'PRIMARY'        |  '1'          |  'id'             |  'A'       |  '2436151'   |  NULL     |  NULL   |  ''    |  'BTREE'    |  ''      |  ''            |
| 'transactions' |  '1'        |  'tran_user_id'   |  '1'          |  'user_id'        |  'A'       |  '56654'     |  NULL     |  NULL   |  ''    |  'BTREE'    |  ''      |  ''            |
| 'transactions' |  '1'        |  'transaction_id' |  '1'          |  'transaction_id' |  'A'       |  '2436151'   |  '191'    |  NULL   |  'YES' |  'BTREE'    |  ''      |  ''            |
| 'transactions' |  '1'        |  'tran_analytics' |  '1'          |  'analytics'      |  'A'       |  '2436151'   |  NULL     |  NULL   |  'YES' |  'BTREE'    |  ''      |  ''            |
| 'transactions' |  '1'        |  'tran_status'    |  '1'          |  'status'         |  'A'       |  '22'        |  NULL     |  NULL   |  'YES' |  'BTREE'    |  ''      |  ''            |
| 'transactions' |  '1'        |  'gordon_trans'   |  '1'          |  'status'         |  'A'       |  '22'        |  NULL     |  NULL   |  'YES' |  'BTREE'    |  ''      |  ''            |
| 'transactions' |  '1'        |  'gordon_trans'   |  '2'          |  'analytics'      |  'A'       |  '2436151'   |  NULL     |  NULL   |  'YES' |  'BTREE'    |  ''      |  ''            |
+----------------+-------------+-------------------+---------------+-------------------+------------+--------------+-----------+---------+--------+-------------+----------+----------------+

Simplified schema for the two tables before adding any extra indexes as suggested as it didn't improve the situation.

CREATE TABLE `analytics` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `user_id` int(11) DEFAULT NULL,
  `affil_user_id` int(11) DEFAULT NULL,
  `product_id` int(11) DEFAULT NULL,
  `medium` varchar(45) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `source` varchar(45) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `terms` varchar(1024) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `is_browser` tinyint(1) DEFAULT NULL,
  `is_mobile` tinyint(1) DEFAULT NULL,
  `is_robot` tinyint(1) DEFAULT NULL,
  `browser` varchar(45) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `mobile` varchar(45) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `robot` varchar(45) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `platform` varchar(45) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `referrer` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `domain` varchar(45) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `ip` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `continent_code` varchar(10) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `country_name` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `city` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  KEY `analytics_user_id` (`user_id`),
  KEY `analytics_product_id` (`product_id`),
  KEY `analytics_affil_user_id` (`affil_user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=64821325 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

CREATE TABLE `transactions` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `transaction_id` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `user_id` int(11) NOT NULL,
  `pay_key` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `sender_email` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `amount` decimal(10,2) DEFAULT NULL,
  `currency` varchar(10) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `status` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `analytics` int(11) DEFAULT NULL,
  `ip_address` varchar(46) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `session_id` varchar(60) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `eu_vat_applied` int(1) DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `tran_user_id` (`user_id`),
  KEY `transaction_id` (`transaction_id`(191)),
  KEY `tran_analytics` (`analytics`),
  KEY `tran_status` (`status`)
) ENGINE=InnoDB AUTO_INCREMENT=10019356 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

If the above can not be optimised any further. Any implementation advice on summary tables will be great. We are using a LAMP stack on AWS. The above query is running on RDS (m1.large).


Solution

  • I would create the following indexes (b-tree indexes):

    analytics(user_id, source, id) 
    transactions(analytics, status)
    

    This is different from Gordon's suggestion.

    The order of columns in the index is important.

    You filter by specific analytics.user_id, so this field has to be the first in the index. Then you group by analytics.source. To avoid sorting by source this should be the next field of the index. You also reference analytics.id, so it is better to have this field as part of the index, put it last. Is MySQL capable of reading just the index and not touching the table? I don't know, but it is rather easy to test.

    Index on transactions has to start with analytics, because it would be used in the JOIN. We also need status.

    SELECT 
        analytics.source AS referrer, 
        COUNT(analytics.id) AS frequency, 
        SUM(IF(transactions.status = 'COMPLETED', 1, 0)) AS sales
    FROM analytics
    LEFT JOIN transactions ON analytics.id = transactions.analytics
    WHERE analytics.user_id = 52094 
    GROUP BY analytics.source 
    ORDER BY frequency DESC 
    LIMIT 10