Search code examples
sqlpostgresqlwindow-functions

first_value and last_value for each user id


I am trying to get the first scheduled start time and last scheduled end time for each user in my scheduled_jobs table for a date.

I can get it to work for one user, but when trying for all users per day I can get the last time, but the cannot get the correct first time it is displaying the time for the first user_id for all user_id's.

Here is my code:

SELECT DISTINCT on (user_id)
    user_id, first_value(scheduled_jobs.at) over (order by user_id, scheduled_jobs.at ASC),
    last_value(scheduled_jobs.to) over (order by user_id, scheduled_jobs.at DESC)
FROM scheduled_jobs
WHERE scheduled_jobs.at between CURRENT_DATE+INTERVAL'3 day' and CURRENT_DATE +INTERVAL '4 day'

Example of current results:

user_id | first_value         | last_value
  19    | 2018-10-29 07:00:00 | 2018-10-29 17:00:00
  30    | 2018-10-29 07:00:00 | 2018-10-29 15:00:00
  37    | 2018-10-29 07:00:00 | 2018-10-29 16:30:00 

Last_value is showing correctly for each user_id, however first_value is always showing the value for the first user_id for all.

I have tried spliting them into different SELECT queries with a JOIN and a USING query but still getting incorrect results for the first_value.


Solution

  • You need a PARTITION BY clause, which generates the frame per user_id

    SELECT DISTINCT on (user_id)
        user_id, 
        first_value(sj.at) OVER (PARTITION BY user_id ORDER BY sj.at ASC),
        last_value(sj.to) OVER (PARTITION BY user_id ORDER BY sj.at DESC)
    FROM 
        scheduled_jobs sj
    WHERE 
        sj.at BETWEEN CURRENT_DATE + 3 and CURRENT_DATE + 4
    

    Addionally: Please be careful by using last_value. Sometimes it would not work as expected. See here

    You should use first_value with DESC ordering instead:

    first_value(scheduled_jobs.at) over (partition by user_id order by scheduled_jobs.at DESC)