Search code examples
sqlpostgresqlcountaggregate-functionsgreatest-n-per-group

Get the maximum value from rows in Postgres records and group by multiple columns


I have a table like this:

p_id |      createdat      | pagetitle | sessionid |      text       | device  | deviceserial
------+---------------------+-----------+-----------+-----------------+---------+--------------
      | 2020-11-27 08:07:39 |           |           | App launch      | android | 636363636890
      | 2020-09-01 08:08:18 |           |           | search          | Android | 636363636890
      | 2020-09-02 08:10:10 |           |           | scan            | Android | 636363636890
      | 2020-09-02 08:12:10 |           |           | destroy         | Android | 636363636890
      | 2020-09-02 08:40:11 |           |           | hi              | IOS     | 6625839827
      | 2020-09-02 08:45:11 |           |           | hi              | IOS     | 6625839827
      | 2020-09-02 08:43:10 |           |           | launchComponent | Android | 636363636890
      | 2020-09-02 08:50:11 |           |           | hi              | IOS     | 6625839827
      | 2020-09-02 08:47:10 |           |           | launchComponent | Android | 636363636890
      | 2020-09-02 08:53:11 |           |           | hi              | IOS     | 6625839827
      | 2020-09-02 08:50:10 |           |           | launchComponent | Android | 636363636890
      | 2020-09-02 08:55:11 |           |           | hi              | IOS     | 6625839827
      | 2020-09-02 08:52:10 |           |           | launchComponent | Android | 636363636890
      | 2020-09-02 09:00:11 |           |           | hi              | IOS     | 6625839827
      | 2020-09-02 08:55:10 |           |           | launchComponent | Android | 636363636890
      | 2020-09-02 09:05:11 |           |           | hi              | IOS     | 6625839827
      | 2020-09-02 08:59:10 |           |           | launchComponent | Android | 636363636890
      | 2020-09-02 09:07:11 |           |           | hi              | Android | 6625839827
      | 2020-09-02 09:01:10 |           |           | launchComponent | Android | 636363636890
      | 2020-09-02 09:09:11 |           |           | hi              | IOS     | 6625839827
      | 2020-09-02 09:03:10 |           |           | launchComponent | Android | 636363636890
      | 2020-09-02 09:09:11 |           |           | hi              | Android | 6625839828
      | 2020-09-02 09:03:10 |           |           | launchComponent | IOS     | 636363636891
      | 2020-09-02 09:13:11 |           |           | hi              | Android | 6625839828
      | 2020-09-02 09:06:10 |           |           | launchComponent | IOS     | 636363636891

From this table ,I want to achieve something like this:

deviceserial | event_count |         hr             device
--------------+-------------+---------------------+---------------------
 6625839828   |           2 | 2020-09-02 09:00:00 |Android
 636363636890 |           8 | 2020-09-02 08:00:00 |Android
 636363636891 |           2 | 2020-09-02 09:00:00 |IOS
 6625839827   |           5 | 2020-09-02 08:00:00 |IOS
 

This is my steps:I grouped by records with deviceserial ,hourly as hr and device and count max(event_count).

I tried this query:

select deviceserial,max(event_count) as event_count,hr,device
from (
    select deviceserial,count(*) as event_count,
        date_trunc('hour', createdat) as hr,device
    from devices  
    group by deviceserial,hr,device
) t
group by deviceserial,hr,device

This is my result:

 deviceserial | event_count |         hr            device
--------------+-------------+---------------------+---------------------
636363636890      1          2020-11-27 08:00:00  |        android
636363636891      2          2020-09-02 09:00:00  |        IOS
6625839827        4          2020-09-02 09:00:00  |        IOS
6625839827        5          2020-09-02 08:00:00  |        IOS
636363636890      8          2020-09-02 08:00:00  |       Android
636363636890      1          2020-09-01 08:00:00  |       Android
636363636890      2          2020-09-02 09:00:00  |       Android
6625839828        2          2020-09-02 09:00:00  |       Android

Solution

  • If I follow you correctly, you can use distinct on:

    select distinct on (deviceserial) 
        deviceserial,
        count(*) as event_count,
        date_trunc('hour', createdat) as hr,
        device
    from devices  
    group by deviceserial, hr, device
    order by deviceserial, event_count desc
    

    This gives you the hour / device where each device serial had most events. Note, however, that this does not properly handle ties (this gives only one row per device serial). If you wanted to allow top ties, you would use rank() instead:

    select *
    from (
        select deviceserial,
            count(*) as event_count,
            date_trunc('hour', createdat) as hr,
            device,
            rank() over(partition by deviceserial order by event_count desc) rn
        from devices  
        group by deviceserial, hr, device
    ) t
    where rn = 1
    order by deviceserial
    

    Or, in Postgres 13:

    select deviceserial,
        count(*) as event_count,
        date_trunc('hour', createdat) as hr,
        device
    from devices  
    group by deviceserial, hr, device
    order by rank() over(partition by deviceserial order by event_count desc)
    fetch first row with ties