Search code examples
mysqlsqldatetimemaxgreatest-n-per-group

Get the newest record from MySQL from 2 tables more optimalized


I have some problems with query in SQL. I have 2 tables.

people
    +----+--------+------+
    | id |  name  | val2 |
    +----+--------+------+
    |  1 | john   |   12 |
    |  2 | adam   |    5 |
    |  3 | alfred |    3 |
    +----+--------+------+

data
+----+----+----+-----+---------------------+
| id | v1 | v2 | v3  |        date         |
+----+----+----+-----+---------------------+
|  1 |  4 | 15 |  18 | 2020-10-16 11:15:53 |
|  1 |  2 | 12 |  17 | 2020-10-16 11:22:53 |
|  1 |  3 | 13 |  16 | 2020-10-16 11:32:53 |
|  2 |  1 | 16 |  15 | 2020-10-16 13:22:53 |
|  2 |  3 | 13 |  25 | 2020-10-16 13:42:53 |
|  2 |  4 | 12 |  35 | 2020-10-16 14:12:53 |
|  3 |  1 | 21 | 12  | 2020-10-16 14:12:53 |
|  3 |  2 | 28 | 42  | 2020-10-16 15:12:53 |
|  3 |  4 | 30 | 72  | 2020-10-16 16:12:53 |
+----+----+----+-----+---------------------+

I need to get in one table ID, NAME, v1,v2,v3,date for the new date to all object from first table something like this:

RESULT
+----+--------+----+----+-----+---------------------+
| id |  name  | v1 | v2 | v3  |        date         |
+----+--------+----+----+-----+---------------------+
|  1 | john   |  3 | 13 |  16 | 2020-10-16 11:32:53 |
|  2 | adam   |  4 | 12 |  35 | 2020-10-16 14:12:53 |
|  3 | alfred |  4 | 30 | 72  | 2020-10-16 16:12:53 |
+----+--------+----+----+-----+---------------------+

I need the newest record from SECOND TABLE for all people from first table. I try do it by this query:

SELECT people.id,
   people.name,
   data.v1,
   data.v2,
   data.v3,
   max(data.date)
FROM people
JOIN DATA ON people.id = data.id
GROUP BY people.id

I got the newest data but v1, v2, v3 is random from table.


Solution

  • You want entire rows from data, so aggregation is not an option here. In most databases, your query would fail, because the select and group by clause are not consistent... But MySQL, somehow unfortunaltely, gives you enough rope to developers to to hang themselves with. Your query runs (if sql mode ONLY_FULL_GROUP_BY is disabled), but is actually equivalent to:

    SELECT people.id, people.name, ANY_VALUE(data.v1), ANY_VALUE(data.v2), ANY_VALUE(data.v3), MAX(data.date)
    FROM people
    JOIN data on people.id = data.id
    GROUP BY people.id
    

    Now it is plain to see that the database gives you any value of data rows that match the join condition - which may, or may not belong to the row that has the latest date.

    Instead of grouping, you actually need to filter. One option uses a subquery:

    select p.id, p.name, d.v1, d.v2, d.v3, d.date
    from people p
    inner join data d on d.id = p.id
    where d.date = (select max(d1.date) from data d1 where d1.id = d.id)
    

    The upside of this approach is that it works in all versions of MySQL, including pre-8.0, where window functions are not available.