Search code examples
sqlpostgresqlgreatest-n-per-group

Get latest record based on two fields


I have a table with the following fields:

user_id         year      month     amount    type
--------------------------------------------------
5               2018      1         100       Foo
6               2017      12        100       Bar
6               2018      11        100       Foo
6               2018      12        100       Bar
7               2018      12        100       Foo
7               2019      12        100       Bar
8               2019      12        100       Foo

I'm looking to do 3 things:

  1. Get the latest record per user id (1 record, latest year and month field) for certain users,

something like:

select *
from myTable
where user_id in (6,7) and <is latest year / month>

which should return

user_id         year      month     amount     type
---------------------------------------------------
6               2018      12        100        Bar
7               2019      12        100        Foo
  1. calculate totals for the above query, something like:
    select SUM(amount) as myTotal, avg(amount) as myAverage, 
    (count # of foos in result) as numberOfFoos,
    (count # of bars in result) as numberOfBars
    from myTable
    where user_id in (6,7) and <is latest year / month>

which would return one row with the custom fields:

myTotal     myAverage     numberOfFoos      numberOfBars
--------------------------------------------------------
300         100           2                 1
  1. Same as above query, but for EVERY MONTH for a period of time, eg. for the last 3-5 years, ideally in the fewest calls possible, instead of looping through 36+ months manually and making separate calls.
    year     month    myTotal     myAverage     numberOfFoos      numberOfBars
    --------------------------------------------------------------------------
    2018     1        300         100           2                 1
    2018     2        300         100           2                 1
    2018     3        300         100           2                 1
    ...
    2020     12       300         100           2                 1

Solution

  • First problem can be solved using row_number function:

    with d as (
      select 
        users.*, 
        row_number() over (partition by user_id  order by year desc, month desc ) rn 
      from  users 
      where  user_id in (5, 6)
    ) 
    select * from d
    where  rn = 1;
    

    PostgreSQL fiddle here

    Second problem can be solved using same way:

    with d as (
      select 
        users.*, 
        row_number() over (partition by user_id  order by year desc, month desc) rn 
      from  users 
      where  user_id in (6, 7)
    ) 
    select 
        sum(amount), avg(amount), 
        count(*) filter (where type = 'Foo') count_foo, 
        count(*) filter (where type = 'Bar') count_bar
    from d
    where  rn = 1;
    

    Test solution here