Search code examples
sqloracle-databaseoracle12c

Fetch minimum effective date comparing current and previous rows


Im working on Oracle 12c, I have a requirement to fetch the minimum effective date based on group by of columns, however for only period until there is no change, example below :

Assume we have Employee ID who changes departments over period of time and we want to capture the minimum effective date per department as below but minimum effective date only until the change.

EMP_ID EFF_DT DEPT_ID
100 01/01/2015 ENGINEERING
100 01/01/2016 ENGINEERING
100 01/01/2017 ENGINEERING
100 01/01/2018 FINANCE
100 01/01/2019 FINANCE
100 01/01/2020 ENGINEERING
100 01/01/2021 ENGINEERING

Target:

EMP_ID EFF_DT DEPT_ID
100 01/01/2015 ENGINEERING
100 01/01/2018 FINANCE
100 01/01/2020 ENGINEERING

How to accomplish ? I tried doing LAG and trying to compare current and previous but not able to identify the minimum during the timeframe of no change.


Solution

  • Solution using match_recognize (available since version 12.1 of the database)

    Setting up data for testing:

    alter session set nls_date_format='mm/dd/yyyy';
    
    create table my_table (emp_id, eff_dt, dept_id) as
      select 100, to_date('01/01/2015'), 'ENGINEERING' from dual union all
      select 100, to_date('01/01/2016'), 'ENGINEERING' from dual union all
      select 100, to_date('01/01/2017'), 'ENGINEERING' from dual union all
      select 100, to_date('01/01/2018'), 'FINANCE'     from dual union all
      select 100, to_date('01/01/2019'), 'FINANCE'     from dual union all
      select 100, to_date('01/01/2020'), 'ENGINEERING' from dual union all
      select 100, to_date('01/01/2021'), 'ENGINEERING' from dual
    ;
    

    Query and output:

    select emp_id, eff_dt, dept_id
    from   my_table
    match_recognize(
      partition by emp_id
      order     by eff_dt
      all rows per match
      pattern   ( a {- b* -} )
      define    b as dept_id = a.dept_id
    );
    
        EMP_ID EFF_DT     DEPT_ID    
    ---------- ---------- -----------
           100 01/01/2015 ENGINEERING
           100 01/01/2018 FINANCE    
           100 01/01/2020 ENGINEERING
    

    Brief explanation:

    The match_recognize clause partitions the input rows by emp_id and orders them by eff_dt. Then it further divides the rows, within each partition and following the date order, into "matches", matching the pattern given in the pattern clause. That is a single a row followed by 0 or more b rows, where b is defined (in the define clause) to require that the dept_id be the same as in the first row of the match. There are no conditions for a row to be an a row; any row, if it cannot be classified as b, will be classified as a (and it will start a new match!)

    "All" rows in a match are returned, except those enclosed in {- -} in the pattern clause. That is: the a row (first row) in each match is returned, while the b rows are not. Exactly as specified.

    EDIT

    For Oracle version 11.2 or lower, and also for database products (other than Oracle) that don't yet support match_recognize, this can be done with analytic functions, which are supported by the vast majority of databases.

    The version below does almost exactly the same things as the match_recognize solution:

    select emp_id, eff_dt, dept_id
    from   (
             select emp_id, eff_dt, dept_id,
                    case when lag(dept_id) over (partition by emp_id
                                                 order     by eff_dt) = dept_id
                         then 'B' else 'A' end as classifier
             from   my_table
           )
    where  classifier = 'A'
    ;