Search code examples
mysqlsqldatabasequery-optimization

How to get a single record per group from one day ago in a very large MySQL table?


Suppose I have a table containing various types of fruit and vegetables. Once every 5-10 minutes, I am checking the current price of all items I have listed, and I insert a new record for that timestamp with the current price.

The following is an example of such a table (number of rows reduced for readability, but imagine there are entries every 5 minutes for each fruit):

| slug   | price | time                |
+--------+-------+---------------------+
| tomato | 1.5   | 2022-02-05 18:05:00 |
| tomato | 1.5   | 2022-02-05 21:05:00 |
| tomato | 1.55  | 2022-02-06 01:05:00 |
| tomato | 1.7   | 2022-02-06 08:05:00 |
| tomato | 1.65  | 2022-02-06 14:05:00 |
| tomato | 1.63  | 2022-02-07 02:05:00 |
| tomato | 1.69  | 2022-02-07 10:05:00 |
| tomato | 1.7   | 2022-02-07 18:05:00 |
| tomato | 1.49  | 2022-02-08 04:05:00 |
| tomato | 1.58  | 2022-02-08 18:05:00 |
| kiwi   | 0.9   | 2022-02-05 18:05:00 |
| kiwi   | 0.95  | 2022-02-05 21:05:00 |
| kiwi   | 0.81  | 2022-02-06 01:05:00 |
| kiwi   | 1.01  | 2022-02-06 08:05:00 |
| kiwi   | 1.05  | 2022-02-06 14:05:00 |
| kiwi   | 1.1   | 2022-02-07 02:05:00 |
| kiwi   | 1.08  | 2022-02-07 10:05:00 |
| kiwi   | 0.95  | 2022-02-07 18:05:00 |
| kiwi   | 1.04  | 2022-02-08 04:05:00 |
| kiwi   | 1.15  | 2022-02-08 18:05:00 |
| lemon  | 1.69  | 2022-02-05 18:05:00 |
| lemon  | 1.3   | 2022-02-05 21:05:00 |
| lemon  | 1.35  | 2022-02-06 01:05:00 |
| lemon  | 1.35  | 2022-02-06 08:05:00 |
| lemon  | 1.3   | 2022-02-06 14:05:00 |
| lemon  | 1.35  | 2022-02-07 02:05:00 |
| lemon  | 1.41  | 2022-02-07 10:05:00 |
| lemon  | 1.49  | 2022-02-07 18:05:00 |
| lemon  | 1.5   | 2022-02-08 04:05:00 |
| lemon  | 1.49  | 2022-02-08 18:05:00 |

Now, I want to select all items that I have, and show what their price was 24 hours ago. Using the following query, I can fetch a single entry per each such item that matches the "1 day ago" criteria:

SELECT slug, price, time
FROM items 
WHERE time >= NOW() - INTERVAL 1 DAY
GROUP BY slug

The problem, however, is that over time the size of the table increases substantially, and this query, that used to take a fraction of a second, now takes 5-10 seconds (in my current table I have roughly 9 million records like these, with some additional columns of course).

Is there a more optimal way to execute this?


Solution

  • I guess you want to display ....

    • the slug, price, and time of
    • the oldest row in the table
    • that's newer than one day ago
    • for each slug.

    First, get the times of the records you want with this subquery.

             SELECT MIN(time) time, slug
               FROM items
              WHERE time >= NOW() - INTERVAL 24 HOUR
              GROUP BY slug
    

    Then join that subquery to your table like this.

    SELECT items.slug, items.price, items.time
      FROM items
      JOIN (
             SELECT MIN(time) time, slug
               FROM items
              WHERE time >= NOW() - INTERVAL 1 DAY
              GROUP BY slug
           ) h ON items.slug = h.slug AND items.time = h.time
     ORDER BY slug
    

    This index will help you make this fast. The server can jump immediately to the relevant rows by time.

    ALTER TABLE CREATE INDEX timeslug (time, slug);