Search code examples
sqlpostgresqlgreatest-n-per-group

PostgreSQL Select the r.* by MIN() with group-by on two columns


The example schema of a table called results

id user_id activity_id activity_type_id start_date_local elapsed_time
1 100 11111 1 2014-01-07 04:34:38 4444
2 100 22222 1 2015-04-14 06:44:42 5555
3 100 33333 1 2015-04-14 06:44:42 7777
4 100 44444 2 2014-01-07 04:34:38 12345
5 200 55555 1 2015-12-22 16:32:56 5023

The problem

Select the results of fastest activities (i.e. minimum elapsed time) of each user by activity_type_id and year.

(Basically, in this simplified example, record ID=3 should be excluded from the selection, because record ID=2 is the fastest for user 100 of the given activity_type_id 1 and the year of 2015)

What I have tried

SELECT user_id,
       activity_type_id,
       EXTRACT(year FROM start_date_local) AS year,
       MIN(elapsed_time) AS fastest_time
FROM results
GROUP BY activity_type_id, user_id, year
ORDER BY activity_type_id, user_id, year;

Actual

Which selects the correct result set I want, but only contains the grouped by columns

user_id activity_type_id year fastest_time
100 1 2014 4444
100 1 2015 5555
100 2 2014 12345
200 1 2015 5023

Goal

To have the actual full record with all columns. i.e. results.* + year

id user_id activity_id activity_type_id start_date_local year elapsed_time
1 100 11111 1 2014-01-07 04:34:38 2014 2014
2 100 22222 1 2015-04-14 06:44:42 2015 5555
4 100 44444 2 2014-01-07 04:34:38 2014 12345
5 200 55555 1 2015-12-22 16:32:56 2015 5023

Solution

  • I think you want this:

    SELECT DISTINCT ON (user_id, activity_type_id, EXTRACT(year FROM start_date_local)) 
         *, EXTRACT(year FROM start_date_local) AS year
    FROM results
    ORDER BY user_id, activity_type_id, year, elapsed_time;