Search code examples
mysqldatetimeoptimizationindexinginnodb

Optimizing Datetime searches in huge MySQL InnoDB table


I am trying to optimize a big MySQL InnoDB Table with 50 million rows in it. It is a kind of a log. Each row contains some columns with information and a Datetime column.

These 50 million rows contain only 5-6 dates, so there are only a few distinct dates but with different hours, minutes and seconds. Each row has a unique ID (primary key). The DateTime column has an index.

The searches are performed with the only date (w/o using hours, minutes, and sec), f.e.

select * from table where date(datetime_column) = '2021-03-08'

I've already tried to rewrite the queries without date() function, like:

select * from table where datetime_column >= '2021-03-08' and datetime_column <='2021-03-08 23:59:59'

But it's only a bit faster.

Also, I've created a new table, put the ID (primary key from the main table), year, month, day, hour, minutes, and seconds to tyniints (the year is int(4)), made a combined index on them and performed the select from the main table with join to this new table, but it's still not fast enough, because index for hours, minutes and seconds become useless while these columns are not mentioned in the "where" clause.

Also, I've thought about partitioning, but I think it won't help too.

Any ideas on how to solve it?


Solution

  • Change from

    PRIMARY KEY(id),
    INDEX(datetime)
    

    to

    PRIMARY KEY(datetime, id),  -- to greatly speed up your range query
    INDEX(id)   -- sufficient to keep AUTO_INCREMENT happy
    

    Do not use the DATE(datetime) = constant; it cannot use any index. Your other attempt can use an index in some situations. I like this way to phrase it:

    WHERE datetime >= '2021-03-08'
      AND datetime  < '2021-03-08' + INTERVAL 1 DAY
    

    Oh, you say there is more to the WHERE? Let's see them; it may make a big difference! Also, let us know whether the datetime range does most of the filtering or the other clause(s) do more.

    Many queries look something like

    WHERE datetime in some range AND abc=123
    

    That benefits from INDEX(abc, datetime), in that order. Pulling the PK trick above may also be beneficial: PRIMARY KEY(abc, datetime, id), INDEX(id).