Search code examples
sqlpostgresqlselectcountgreatest-n-per-group

Select first `n` rows of a grouped query


I am using PostgreSQL with SQLAlchemy

I have a table of GPS metrics in the form:

SELECT * FROM user_gps_location;

My Output:

| id | user_id | entry_time                 | lat         | lng           | accuracy | altitude | speed |
| 1  | 54      | 2020-07-24 14:08:30.000000 | 54.42184220 | -110.21029370 |    41.42 |   512.40 |  0.07 |
| 2  | 54      | 2020-07-24 22:20:12.000000 | 54.42189750 | -110.21038070 |    13.00 |   512.60 |  0.00 |
| 3  | 26      | 2020-07-27 13:51:11.000000 | 54.41453910 | -110.20775990 |  1300.00 |     0.00 |  0.00 |
| 4  | 26      | 2020-07-27 22:59:00.000000 | 54.42122590 | -110.20959960 |   257.52 |   509.10 |  0.00 |
| 5  | 26      | 2020-07-28 13:54:12.000000 | 54.42185280 | -110.21025010 |    81.45 |   510.20 |  0.00 |
...

I need to be able to answer the question "What are the latest 5 entries for each user since "", sorted by entry_time

Right now I only have a basic query:

select *
from user_gps_location
where user_id in (select distinct user_id
                  from user_gps_location
                  where entry_time > '2020-09-01')
  and entry_time > '2020-09-01';

Applying a limit will not do what I want. I assume I need to use a grouping and window functions (?), but I do not understand them.


Solution

  • The row_number function is exactly what you're looking for:

    SELECT * 
    FROM   (SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY entry_time DESC) AS rn
            FROM   user_gps_location
            WHERE  entry_time > '2020-09-01') t
    WHERE  rn <= 5