Search code examples
sqldatabaseoracle

SQL - Lag to previous distinct value


Is there a way to modify a Lag function to not only go back a previous row, but back to a previous distinct value?

Current output:

 Job Date      Previous Job Date   
-----------    ---------------- 
06/10/2013             -
06/10/2013         06/10/2013
06/10/2013         06/10/2013
07/16/2014         06/10/2013
07/16/2014         07/16/2014
06/07/2015         07/16/2014                   
06/07/2015         06/07/2015
06/07/2015         06/07/2015

Desired output:

 Job Date      Previous Job Date   
-----------    ---------------- 
06/10/2013             -
06/10/2013             -
06/10/2013             -
07/16/2014         06/10/2013
07/16/2014         06/10/2013
06/07/2015         07/16/2014                   
06/07/2015         07/16/2014 
06/07/2015         07/16/2014 

Right now i'm using

Lag(a1."Job Date", 1) over (partition by a1."Employee Number" Order By a1."Employee Number")

but this is returning the previous rows record instead of the previous distinct or non-matching record. Is this possible with lag or perhaps a different function?


Solution

  • This is a nifty trick. If you use the range between with "1 PRECEDING" it starts the window at the previous distinct value. Assumes you have some key you want to partition on, but if that's not the case you can just remove the PARTITION BY clause.

    Fiddle: https://dbfiddle.uk/3tPzIiDN

    create table some_test_data
      (
        id integer,
        job_date date
      );
    
    insert into some_test_data values
      (1, date '2023-06-01'),
      (1, date '2023-06-01'),
      (1, date '2023-06-01'),
      (1, date '2023-07-01'),
      (1, date '2023-07-01'),
      (1, date '2023-07-16'),
      (1, date '2023-07-16'),
      (1, date '2023-07-16');
    
    select s.*,
           max(job_date) over (
             partition by id
             order by job_date
             range between unbounded preceding and 1 preceding)
      from some_test_data s;
    

    Or, the equivalent with correlated subquery

    select t1.*,
           ( select max(t2.job_date)
               from some_test_data t2
              where t1.id = t2.id
                and t1.job_date > t2.job_date
           ) as last_distinct_job_date
      from some_test_data t1;
    
    ID JOB_DATE LAST_DISTINCT_JOB_DATE
    1 01-JUN-23 null
    1 01-JUN-23 null
    1 01-JUN-23 null
    1 01-JUL-23 01-JUN-23
    1 01-JUL-23 01-JUN-23
    1 16-JUL-23 01-JUL-23
    1 16-JUL-23 01-JUL-23
    1 16-JUL-23 01-JUL-23