Search code examples
sqlsqlitegroup-bysql-order-bygreatest-n-per-group

Select the latest 3 records for each ID in a table


I have a table with a composite primary key (ID, Date) like below.

+------+------------+-------+
|  ID  |    Date    | Value |
+------+------------+-------+
|   1  | 1433419200 |   15  |
|   1  | 1433332800 |   23  |
|   1  | 1433246400 |   41  |
|   1  | 1433160000 |   55  |
|   1  | 1432900800 |   24  |
|   2  | 1433419200 |   52  |
|   2  | 1433332800 |   23  |
|   2  | 1433246400 |   39  |
|   2  | 1433160000 |   22  |
|   3  | 1433419200 |   11  |
|   3  | 1433246400 |   58  |
|  ... |    ...     |  ...  |
+------+------------+-------+

There is also a separate index on Date column. The table is of moderate size, currently ~600k row and growing by ~2k everyday.

I want to do a single SELECT query that returns the latest 3 records (ordered by Date timestamp) for each ID. For each given ID, the Date values are always unique, so no need to worry about ties for Date here.

I've tried a self-join approach, inspired by this answer, but it took quite a few seconds to run and returned nothing:

SELECT p1.ID, p1.Date, p1.Value FROM MyTable AS p1
LEFT JOIN MyTable AS p2 
ON p1.ID=p2.ID AND p1.Date<=p2.Date
GROUP BY p1.ID
HAVING COUNT(*)<=5
ORDER BY p1.ID, p1.Date DESC;

What would be a fast solution here?


Solution

  • You could look up the three most recent dates for each ID:

    SELECT ID, Date, Value
    FROM MyTable
    WHERE Date IN (SELECT Date
                   FROM MyTable AS T2
                   WHERE T2.ID = MyTable.ID
                   ORDER BY Date DESC
                   LIMIT 3)
    

    Alternatively, look up the third most recent date for each ID, and use it as a limit:

    SELECT ID, Date, Value
    FROM MyTable
    WHERE Date >= IFNULL((SELECT Date
                          FROM MyTable AS T2
                          WHERE T2.ID = MyTable.ID
                          ORDER BY Date DESC
                          LIMIT 1 OFFSET 2),
                         0)
    

    Both queries should get good performance from the primary key's index.