Search code examples
jsonpostgresqldistinct-values

Postgres, get unique records per day from date range select


I need to make a report with logged users by date range but without duplicates on the same day (if someone was logged twice on the same day we won't list it twice). unfortunately, we keep login information as json (yeah, I can't change it to separate table, I don't know who designed this db). Query to see all logged user:

select a.id, username, email, ah.modified as login_date
from accounts a join
     account_history ah
     on modified_acc_id = a.id
 where ah.data::jsonb->>'message' = 'Logon';

Modified is timestamp with timezone and it is used as login date.

I found only examples with the count distinct ids per day but I don't know how to modified it to return distinct results per day

sample data:

 id  |        username  |              email       |         login_date
-----+-------------------------+---------------------------------+----------------------------
 102 | example          | [email protected]      | 2018-12-06 09:30:10.573+00
 102 | example          | [email protected]      | 2018-12-06 09:32:34.235+00
  42 | rafal            | [email protected]        | 2018-12-06 09:45:24.884+00
 576 | john             | [email protected]         | 2018-12-06 09:35:24.922+00
 576 | john             | [email protected]         | 2018-12-07 09:58:04.253+00

wanted data:

 id  |        username  |              email       |         login_date
-----+-------------------------+---------------------------------+----------------------------
 102 | example          | [email protected]      | 2018-12-06 09:30:10.573+00
  42 | rafal            | [email protected]        | 2018-12-06 09:45:24.884+00
 576 | john             | [email protected]         | 2018-12-06 09:35:24.922+00
 576 | john             | [email protected]         | 2018-12-07 09:58:04.253+00

So as you can see, without second row


Solution

  • DISTINCT ON gives you exactly the first row of an ordered group. In your example the group is the id and the date part of the login_date timestamp

    SELECT DISTINCT ON (id, login_date::date)
        *
    FROM (
        -- <your query>
    ) s
    ORDER BY id, login_date::date, login_date
    

    demo:db<>fiddle

    Explanation of ORDER BY clause:

    You have to order by the DISTINCT columns first. But in your case you don't really want to order by date only but by its time part too. So after ordering by date (which is necessary because of your DISTINCT columns) you have to order by timestamp as well.


    So the whole query could be simplified to (without subquery):

    SELECT DISTINCT ON (a.id, ah.modified::date) 
        a.id, 
        username, 
        email, 
        ah.modified as login_date
    FROM accounts a 
    JOIN account_history ah
        ON modified_acc_id = a.id
    WHERE ah.data::jsonb->>'message' = 'Logon'
    ORDER BY a.id, ah.modified::date, ah.modified