Using T-SQL, I have a table [EMP]:
Id | SD | ED | Status | Assignment |
---|---|---|---|---|
1 | 1/1/2020 | 1/15/2020 | A | A1 |
1 | 1/16/2020 | 2/1/2020 | T | null |
1 | 2/2/2020 | 3/20/2021 | A | null |
1 | 3/21/2021 | 10/1/2022 | A | B6 |
3 | 10/15/2022 | 5/12/2023 | A | A1 |
2 | 1/3/2022 | 2/1/2022 | A | B2 |
2 | 2/2/2022 | 11/1/2023 | T | null |
1 | 10/2/2022 | 1/1/2023 | T | null |
1 | 1/2/2023 | 5/12/2023 | A | A1 |
2 | 11/2/2023 | 5/12/2023 | A | A8 |
Id is the Employee Id, SD is the Record Effective Start Date, ED is the Record Effective End Date, and Status is the Employment Status (A = Active, T = Terminated)
I am attempting to add two additional columns, one for the employee's original hire date (the earliest SD for each Id with a Status = 'A'), and another for the most recent hire date for that record (the earliest SD after a record where Status = 'T', or the original hire date).
I can gather the original hire date:
SELECT Id, SD, ED, Status, Assignment, OriginalHireDate
FROM EMP t1
INNER JOIN (
SELECT Id, MIN(SD) AS OriginalHireDate
FROM EMP
WHERE Status = 'A'
GROUP BY Id
) t2 ON t1.Id = t2.Id
But I may be overthinking how to grab each record's most recent hire date. The desired output is:
Id | SD | ED | Status | Assignment | OriginalHireDate | RecentHireDate |
---|---|---|---|---|---|---|
1 | 1/1/2020 | 1/15/2020 | A | A1 | 1/1/2020 | 1/1/2020 |
1 | 1/16/2020 | 2/1/2020 | T | null | 1/1/2020 | 1/1/2020 |
1 | 2/2/2020 | 3/20/2021 | A | null | 1/1/2020 | 2/2/2020 |
1 | 3/21/2021 | 10/1/2022 | A | B6 | 1/1/2020 | 2/2/2020 |
3 | 10/15/2022 | 5/12/2023 | A | A1 | 10/15/2022 | 10/15/2022 |
2 | 1/3/2022 | 2/1/2022 | A | B2 | 1/3/2022 | 1/3/2022 |
2 | 2/2/2022 | 11/1/2023 | T | null | 1/3/2022 | 1/3/2022 |
1 | 10/2/2022 | 1/1/2023 | T | null | 1/1/2020 | 2/2/2020 |
1 | 1/2/2023 | 5/12/2023 | A | A1 | 1/1/2020 | 1/2/2023 |
2 | 11/2/2023 | 5/12/2023 | A | A8 | 1/3/2022 | 11/2/2023 |
I understand that your rows represent employment starts and end, as indicatd by the status
column: an employment starts with a "A" status, optionnaly continues with more "A" rows, and then ends with a "T" row.
We can identify employment streaks by comparing the "last" status with the current one; when it transitions from "T" to "A", a new streak starts. Once the islands are defined, we can compute the beginning with just a window min.
select e.*,
min(case when status = 'A' then sd end) over(partition by emp) original_hire_date,
min(case when status = 'A' then sd end) over(partition by emp, grp) recent_hire_date
from (
select e.*,
sum(case when status = 'A' and lag_status = 'T' then 1 else 0 end) over(partition by emp order by sd) grp
from (
select e.*, lag(status) over(partition by emp order by sd) lag_status
from emp e
) e
) e
order by id, sd
Note that this happily handles series of "A" (as shown in your data) and series of "T", as it only increments on "A"/"T" transitions.
Id | SD | ED | Status | Assignment | lag_status | grp | original_hire_date | recent_hire_date |
---|---|---|---|---|---|---|---|---|
1 | 2020-01-01 | 2020-01-15 | A | A1 | null | 0 | 2020-01-01 | 2020-01-01 |
1 | 2020-01-16 | 2020-02-01 | T | null | A | 0 | 2020-01-01 | 2020-01-01 |
1 | 2020-02-02 | 2021-03-20 | A | null | T | 1 | 2020-01-01 | 2020-02-02 |
1 | 2021-03-21 | 2022-10-01 | A | B6 | A | 1 | 2020-01-01 | 2020-02-02 |
1 | 2022-10-02 | 2023-01-01 | T | null | A | 1 | 2020-01-01 | 2020-02-02 |
1 | 2023-01-02 | 2023-05-12 | A | A1 | T | 2 | 2020-01-01 | 2023-01-02 |
2 | 2022-01-03 | 2022-02-01 | A | B2 | null | 0 | 2022-01-03 | 2022-01-03 |
2 | 2022-02-02 | 2023-11-01 | T | null | A | 0 | 2022-01-03 | 2022-01-03 |
2 | 2023-11-02 | 2023-05-12 | A | A8 | T | 1 | 2022-01-03 | 2023-11-02 |
3 | 2022-10-15 | 2023-05-12 | A | A1 | null | 0 | 2022-10-15 | 2022-10-15 |