Search code examples
mysqlgreatest-n-per-groupgroupwise-maximum

MySQL group and select only first from each group


I would like to perform a query that will select only the most recent item from a given group.


In this example, I'm tracking vans:

  • Each time they return to base, a check-in is recorded with information - mileage, etc...
  • Each time they make a delivery, a delivery is recorded - customer, etc...

This table lets us know the history for a given van. The data can be produced with a query or stored as we go - this isn't the problem.

 id | checkin_id | delivery_id | van_id
----+------------+-------------+--------
 24 | 15         | NULL        | 3
 25 | NULL       | 28          | 3
 26 | 16         | NULL        | 4
 27 | NULL       | 29          | 3
 28 | NULL       | 30          | 4
 29 | 17         | NULL        | 5

I can see the van's history by querying with ... WHERE van_id=3; - fine.

Conversely, I would like to be able to get a list of vans with their most recent "event". Resulting in this:

 id | checkin_id | delivery_id | van_id
----+------------+-------------+--------
 27 | NULL       | 29          | 3
 28 | NULL       | 30          | 4
 29 | 17         | NULL        | 5

I jumped to the following query:

SELECT * FROM `history` GROUP BY `van_id`;

But this returns the following error:

#1055 - Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'database.history.checkin_id' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

After reading up, I understand what this is about and have to admit that my SQL is somewhat out of date - which of the items from the group do I want returned?

Adding checkin_id and delivery_id to the GROUP BY just shifts the problem - Ultimately I end up with the same set of data, just sorted differently.


This answer piqued my interest, and the graphic really helps to clearly outline the problem, thanks @azerafati!

I want to use the FIRST() or LAST() aggregate function - but MySQL doesn't appear to have them.

How do I reproduce this behaviour without processing all of the data in my application?


Solution

  • I guess your id values are unique, and later records have higher values than earlier records.

    You need to use a subquery that gets the latest id for each van:

             SELECT MAX(id) id, van_id
               FROM history
              GROUP BY van_id
    

    Then join that to your detail query.

     SELECT h.*
       FROM history h
       JOIN (
             SELECT MAX(id) id, van_id
               FROM history
              GROUP BY van_id
            ) m ON h.id = m.id AND h.van_id = m.van_id
    

    But because your id values are unique you can simplify this even more.

     SELECT h.*
       FROM history h
       JOIN (
             SELECT MAX(id) id
               FROM history
              GROUP BY van_id
            ) m ON h.id = m.id