Im working on Oracle 12c, I have a requirement to fetch the minimum effective date based on group by of columns, however for only period until there is no change, example below :
Assume we have Employee ID who changes departments over period of time and we want to capture the minimum effective date per department as below but minimum effective date only until the change.
EMP_ID | EFF_DT | DEPT_ID |
---|---|---|
100 | 01/01/2015 | ENGINEERING |
100 | 01/01/2016 | ENGINEERING |
100 | 01/01/2017 | ENGINEERING |
100 | 01/01/2018 | FINANCE |
100 | 01/01/2019 | FINANCE |
100 | 01/01/2020 | ENGINEERING |
100 | 01/01/2021 | ENGINEERING |
Target:
EMP_ID | EFF_DT | DEPT_ID |
---|---|---|
100 | 01/01/2015 | ENGINEERING |
100 | 01/01/2018 | FINANCE |
100 | 01/01/2020 | ENGINEERING |
How to accomplish ? I tried doing LAG and trying to compare current and previous but not able to identify the minimum during the timeframe of no change.
Solution using match_recognize
(available since version 12.1 of the database)
Setting up data for testing:
alter session set nls_date_format='mm/dd/yyyy';
create table my_table (emp_id, eff_dt, dept_id) as
select 100, to_date('01/01/2015'), 'ENGINEERING' from dual union all
select 100, to_date('01/01/2016'), 'ENGINEERING' from dual union all
select 100, to_date('01/01/2017'), 'ENGINEERING' from dual union all
select 100, to_date('01/01/2018'), 'FINANCE' from dual union all
select 100, to_date('01/01/2019'), 'FINANCE' from dual union all
select 100, to_date('01/01/2020'), 'ENGINEERING' from dual union all
select 100, to_date('01/01/2021'), 'ENGINEERING' from dual
;
Query and output:
select emp_id, eff_dt, dept_id
from my_table
match_recognize(
partition by emp_id
order by eff_dt
all rows per match
pattern ( a {- b* -} )
define b as dept_id = a.dept_id
);
EMP_ID EFF_DT DEPT_ID
---------- ---------- -----------
100 01/01/2015 ENGINEERING
100 01/01/2018 FINANCE
100 01/01/2020 ENGINEERING
Brief explanation:
The match_recognize
clause partitions the input rows by emp_id
and orders them by eff_dt
. Then it further divides the rows, within each partition and following the date order, into "matches", matching the pattern given in the pattern
clause. That is a single a
row followed by 0 or more b
rows, where b
is defined (in the define
clause) to require that the dept_id
be the same as in the first row of the match. There are no conditions for a row to be an a
row; any row, if it cannot be classified as b
, will be classified as a
(and it will start a new match!)
"All" rows in a match are returned, except those enclosed in {- -}
in the pattern
clause. That is: the a
row (first row) in each match is returned, while the b
rows are not. Exactly as specified.
EDIT
For Oracle version 11.2 or lower, and also for database products (other than Oracle) that don't yet support match_recognize
, this can be done with analytic functions, which are supported by the vast majority of databases.
The version below does almost exactly the same things as the match_recognize
solution:
select emp_id, eff_dt, dept_id
from (
select emp_id, eff_dt, dept_id,
case when lag(dept_id) over (partition by emp_id
order by eff_dt) = dept_id
then 'B' else 'A' end as classifier
from my_table
)
where classifier = 'A'
;