Search code examples
postgresqldatetimemaxtruncatedistinct-on

Select the maximum rows of sorted subgroups


Using PostgreSQL 11, I have a table containing a DAY and MONTH_TO_DAY entry for each day of every month. I would like to select the most recent MONTH_TO_DAY entry for each account. My table is:

+------+------------+--------------+------------+--------------------------+
|id    |account     |code          |interval    |timestamp                 |
+------+------------+--------------+------------+--------------------------+
|387276|ALPBls6EsP  |52            |MONTH_TO_DAY|2020-09-01 01:05:00.000000|
|387275|ALPBls6EsP  |52            |DAY         |2020-09-01 01:05:00.000000|
|387272|YkON8lk8A8  |25            |MONTH_TO_DAY|2020-09-01 01:05:00.000000|
|387271|YkON8lk8A8  |25            |DAY         |2020-08-01 01:05:00.000000|
|387273|ALPBls6EsP  |32            |MONTH_TO_DAY|2020-08-31 01:05:00.000000|
|387274|ALPBls6EsP  |32            |DAY         |2020-08-31 01:05:00.000000|
|387272|ALPBls6EsP  |27            |MONTH_TO_DAY|2020-08-30 01:05:00.000000|
|387271|ALPBls6EsP  |27            |DAY         |2020-08-30 01:05:00.000000|
+------+------------+--------------+------------+--------------------------+

If it helps, the entries are always in descending order timewise.

In a query asking for all accounts, since the 31st is the last day of 08 and the 1st is the most recent entry of 09, my expected output would be

+------+------------+--------------+------------+--------------------------+
|id    |account     |code          |interval    |timestamp                 |
+------+------------+--------------+------------+--------------------------+
|387276|ALPBls6EsP  |52            |MONTH_TO_DAY|2020-09-01 01:05:00.000000|
|387272|YkON8lk8A8  |25            |MONTH_TO_DAY|2020-09-01 01:05:00.000000|
|387273|ALPBls6EsP  |32            |MONTH_TO_DAY|2020-08-31 01:05:00.000000|
+------+------------+--------------+------------+--------------------------+

I was thinking I'd like to group entries by month (truncate the dd/hh/ss), and then select the row with the maximum timestamp in each group. I can get the right rows with this but I can't figure out how to get any of the other fields.

SELECT max(timestamp) 
FROM mytable 
GROUP BY date_trunc('month', mytable.timestamp);

I also thought I could use distinct on something like the below, but I'm not too familiar with distinct on or date_trunc and I can't figure out how to use them together.

SELECT distinct on (timestamp)
    *
FROM mytable
ORDER BY date_trunc('month', mytable.timestamp)

Solution

  • You do want distinct on, but you want to apply it to the account:

    select distinct on (account) *
      from mytable
     where interval = 'MONTH_TO_DAY'
     order by account, timestamp desc;
    

    If you want the latest by account by month, then this should work:

    select distinct on (date_trunc('month', timestamp), account) *
      from mytable
     where interval = 'MONTH_TO_DAY'
     order by date_trunc('month', timestamp), account, timestamp desc;