Search code examples
mysqldatabaseindexingquery-optimizationexplain

MySQL show used index in query


For example I have created 3 index:

  • click_date - transaction table, daily_metric table
  • order_date - transaction table

I want to check does my query use index, I use EXPLAIN function and get this result:

+----+--------------+--------------+-------+---------------+------------+---------+------+--------+----------------------------------------------+
| id | select_type  | table        | type  | possible_keys | key        | key_len | ref  | rows   | Extra                                        |
+----+--------------+--------------+-------+---------------+------------+---------+------+--------+----------------------------------------------+
|  1 | PRIMARY      | <derived2>   | ALL   | NULL          | NULL       | NULL    | NULL |    668 | Using temporary; Using filesort              |
|  2 | DERIVED      | <derived3>   | ALL   | NULL          | NULL       | NULL    | NULL |    645 |                                              |
|  2 | DERIVED      | <derived4>   | ALL   | NULL          | NULL       | NULL    | NULL |    495 |                                              |
|  4 | DERIVED      | transaction  | ALL   | order_date    | NULL       | NULL    | NULL | 291257 | Using where; Using temporary; Using filesort |
|  3 | DERIVED      | daily_metric | range | click_date    | click_date | 3       | NULL | 812188 | Using where; Using temporary; Using filesort |
|  5 | UNION        | <derived7>   | ALL   | NULL          | NULL       | NULL    | NULL |    495 |                                              |
|  5 | UNION        | <derived6>   | ALL   | NULL          | NULL       | NULL    | NULL |    645 | Using where; Not exists                      |
|  7 | DERIVED      | transaction  | ALL   | order_date    | NULL       | NULL    | NULL | 291257 | Using where; Using temporary; Using filesort |
|  6 | DERIVED      | daily_metric | range | click_date    | click_date | 3       | NULL | 812188 | Using where; Using temporary; Using filesort |
| NULL | UNION RESULT | <union2,5>   | ALL   | NULL          | NULL       | NULL    | NULL |   NULL |                                              |
+----+--------------+--------------+-------+---------------+------------+---------+------+--------+----------------------------------------------+

In EXPLAIN results I see, that index order_date of transaction table is not used, do I correct understand ? Index click_date of daily_metric table was used correct ?

Please tell my how to understand from EXPLAIN result does my created index is used in query properly ?

My query:

SELECT
    partner_id,
    the_date,
    SUM(clicks) as clicks,
    SUM(total_count) as total_count,
    SUM(count) as count,
    SUM(total_sum) as total_sum,
    SUM(received_sum) as received_sum,
    SUM(partner_fee) as partner_fee
    FROM (
        SELECT
            clicks.partner_id,
            clicks.click_date as the_date,
            clicks,
            orders.total_count,
            orders.count,
            orders.total_sum,
            orders.received_sum,
            orders.partner_fee
        FROM
            (SELECT
                partner_id, click_date, sum(clicks) as clicks
            FROM
                daily_metric WHERE DATE(click_date) BETWEEN '2013-04-01' AND '2013-04-30'
            GROUP BY partner_id , click_date) as clicks
            LEFT JOIN
            (SELECT
                partner_id,
                    DATE(order_date) as order_dates,
                    SUM(order_sum) as total_sum,
                    SUM(customer_paid_sum) as received_sum,
                    SUM(partner_fee) as partner_fee,
                    count(*) as total_count,
                    count(CASE
                        WHEN status = 1 THEN 1
                        ELSE NULL
                    END) as count
            FROM
                transaction WHERE DATE(order_date) BETWEEN '2013-04-01' AND '2013-04-30'
            GROUP BY DATE(order_date) , partner_id) as orders ON orders.partner_id = clicks.partner_id AND clicks.click_date = orders.order_dates
        UNION ALL SELECT
            orders.partner_id,
            orders.order_dates as the_date,
            clicks,
            orders.total_count,
            orders.count,
            orders.total_sum,
            orders.received_sum,
            orders.partner_fee
        FROM
            (SELECT
                partner_id, click_date, sum(clicks) as clicks
            FROM
                daily_metric  WHERE DATE(click_date) BETWEEN '2013-04-01' AND '2013-04-30'
            GROUP BY partner_id , click_date) as clicks
                RIGHT JOIN
            (SELECT
                partner_id,
                    DATE(order_date) as order_dates,
                    SUM(order_sum) as total_sum,
                    SUM(customer_paid_sum) as received_sum,
                    SUM(partner_fee) as partner_fee,
                    count(*) as total_count,
                    count(CASE
                        WHEN status = 1 THEN 1
                        ELSE NULL
                    END) as count
            FROM
                transaction  WHERE DATE(order_date) BETWEEN '2013-04-01' AND '2013-04-30'
            GROUP BY DATE(order_date) , partner_id) as orders ON orders.partner_id = clicks.partner_id AND clicks.click_date = orders.order_dates
        WHERE
            clicks.partner_id is NULL
        ORDER BY the_date DESC
        ) as t
        GROUP BY the_date ORDER BY the_date DESC LIMIT 50 OFFSET 0

Solution

  • Although I can't explain what the EXPLAIN has dumped, I thought there must be an easier solution to what you have and came up with the following. I would suggest the following indexes to optimize your existing query for the WHERE date range and grouping by partner.

    Additionally, when you have a query that uses a FUNCTION on a field, it doesn't take advantage of the index. Such as your DATE(order_date) and DATE(click_date). To allow the index to better be used, qualify the full date/time such as 12:00am (morning) up to 11:59pm. I would typically to this via

    x >= someDate @12:00 and x < firstDayAfterRange.
    

    in your example would be (notice less than May 1st which gets up to April 30th at 11:59:59pm)

    click_date >= '2013-04-01' AND click_date < '2013-05-01'
    
    Table         Index
    transaction   (order_date, partner_id)
    daily_metric  (click_date, partner_id)
    

    Now, an adjustment. Since your clicks table may have entries the transactions dont, and vice-versa, I would adjust this query to do a pre-query of all possible date/partners, then left-join to respective aggregate queries such as:

    SELECT
          AllParnters.Partner_ID,
          AllParnters.the_Date,
          coalesce( clicks.clicks, 0 ) Clicks,
          coalesce( orders.total_count, 0 ) TotalCount,
          coalesce( orders.count, 0 ) OrderCount,
          coalesce( orders.total_sum, 0 ) OrderSum,
          coalesce( orders.received_sum, 0 ) ReceivedSum,
          coalesce( orders.partner_fee 0 ) PartnerFee
       from 
          ( select distinct
                  dm.partner_id, 
                  DATE( dm.click_date ) as the_Date
               FROM
                  daily_metric dm
               WHERE 
                  dm.click_date >= '2013-04-01' AND dm.click_date < '2013-05-01'
            UNION
            select
                  t.partner_id,
                  DATE(t.order_date) as the_Date
               FROM
                  transaction t
               WHERE 
                  t.order_date >= '2013-04-01' AND t.order_date < '2013-05-01' ) AllParnters
    
          LEFT JOIN
             ( SELECT
                     dm.partner_id, 
                     DATE( dm.click_date ) sumDate, 
                     sum( dm.clicks) as clicks
                  FROM
                     daily_metric dm
                  WHERE 
                     dm.click_date >= '2013-04-01' AND dm.click_date < '2013-05-01'
                 GROUP BY 
                    dm.partner_id, 
                    DATE( dm.click_date )  ) as clicks
             ON AllPartners.partner_id = clicks.partner_id
             AND AllPartners.the_date = clicks.sumDate
    
          LEFT JOIN
          ( SELECT 
                  t.partner_id,
                  DATE(t.order_date) as sumDate,
                  SUM(t.order_sum) as total_sum,
                  SUM(t.customer_paid_sum) as received_sum,
                  SUM(t.partner_fee) as partner_fee,
                  count(*) as total_count,
                  count(CASE WHEN t.status = 1 THEN 1 ELSE NULL END) as COUNT
               FROM
                  transaction t
               WHERE 
                  t.order_date >= '2013-04-01' AND t.order_date < '2013-05-01'
               GROUP BY 
                  t.partner_id,
                  DATE(t.order_date) ) as orders 
             ON AllPartners.partner_id = orders.partner_id
             AND AllPartners.the_date = orders.sumDate
       order by
          AllPartners.the_date DESC
       limit 50 offset 0
    

    This way, the first query will be quick on the index to get all possible combinations from EITHER table. Then the left-join will AT MOST join to one row per set. If found, get the number, if not, I am applying COALESCE() so if null, defaults to zero.

    CLARIFICATION.

    Like you when building your pre-aggregate queries of "clicks" and "orders", the "AllPartners" is the ALIAS result of the select distinct of partners and dates within the date range you were interested in. The resulting columns of that where were "partner_id" and "the_date" respective to your next queries. So this is the basis of joining to the aggregates of "clicks" and "orders". So, since I have these two columns in the alias "AllParnters", I just grabbed those for the field list since they are LEFT-JOINed to the other aliases and may not exist in either/or the respective others.