Search code examples
mysqlperformancetimezonedateaddconvert-tz

The best way to convert time zone efficiently in MYSQL query


My table 'my_logs' have about 20,000,000 records, and I want to find out how many logs I have in each date within a few days.

I want to have a result like

+------------+---------+ | date | count | +------------+---------+ | 2016-07-01 | 1623 | | 2016-07-02 | 1280 | | 2016-07-03 | 2032 | +------------+---------+

This query below only take me milliseconds to finish, that's good

SELECT  DATE_FORMAT(created_at, '%Y-%m-%d') as date,
        COUNT(*) as count
    FROM  my_logs
    WHERE  created_at BETWEEN '2016-07-01' AND '2016-07-04'
    GROUP BY  DATE_FORMAT(created_at, '%Y-%m-%d')

The Explain of query:

+------------+---------+-------+-----------------------------+ |select_type | table | type | possible_keys | +------------+---------+-------+-----------------------------+ | SIMPLE | my_logs| index | index_my_logs_on_created_at | +------------+---------+-------+-----------------------------+

+-----------------------------+---------+----------+ | key | key_len | rows | +-----------------------------+---------+----------+ | index_my_logs_on_created_at | 10 | 23458462 | +-----------------------------+---------+----------+

+-----------------------------------------------------------+ | Extra | +-----------------------------------------------------------+ | Using where; Using index; Using temporary; Using filesort | +-----------------------------------------------------------+

However, I need to convert the timezone of each record to fit the time in my country, and I need to group by the 'Date' information, so I need to convert the column itself.

Both

SELECT  COUNT(*)
    FROM  my_logs
    WHERE  DATE_ADD(created_at, INTERVAL 8 HOUR) BETWEEN '2016-07-01' AND '2016-07-04'
    GROUP BY  DATE_FORMAT(DATE_ADD(created_at, INTERVAL 8 HOUR), '%Y-%m-%d')

and

SELECT  COUNT(*)
    FROM  my_logs
    WHERE  CONVERT_TZ(created_at, "+00:00", "+08:00") BETWEEN '2016-07-01' AND '2016-07-04'
    GROUP BY  DATE_FORMAT(CONVERT_TZ(created_at, "+00:00", "+08:00"),

'%Y-%m-%d')

take me about 12s to finish the query, it is unbearable slow!!

(The Explain is the same as the query in the top)


I think it is common problem but I can't find a good way to deal with it, does anyone has a more efficient way to do it? Thanks!


Solution

  • Which datatype, TIMESTAMP vs. DATETIME, did you use? (But, I'll ignore that.)

    Do not "hide" an indexed column (created_at) inside any function (CONVERT_TZ()). It makes it so that the WHERE clause cannot use the index and must scan the table instead. This fix is simple:

    WHERE created_at >= '2016-07-01' - INTERVAL 8 HOUR
      AND created_at  < '2016-07-04' - INTERVAL 8 HOUR
    

    (or use CONVERT_TZ). Note that I also fixed the bug wherein you included midnight from the 4th. Note: Even + INTERVAL... is effectively a function.

    Expressions in the SELECT and the GROUP BY are far less critical to performance.