Search code examples
sqlt-sqlwindow-functionsgaps-and-islands

T-SQL - Return the most recent hire date for each record


Using T-SQL, I have a table [EMP]:

Id SD ED Status Assignment
1 1/1/2020 1/15/2020 A A1
1 1/16/2020 2/1/2020 T null
1 2/2/2020 3/20/2021 A null
1 3/21/2021 10/1/2022 A B6
3 10/15/2022 5/12/2023 A A1
2 1/3/2022 2/1/2022 A B2
2 2/2/2022 11/1/2023 T null
1 10/2/2022 1/1/2023 T null
1 1/2/2023 5/12/2023 A A1
2 11/2/2023 5/12/2023 A A8

Id is the Employee Id, SD is the Record Effective Start Date, ED is the Record Effective End Date, and Status is the Employment Status (A = Active, T = Terminated)

I am attempting to add two additional columns, one for the employee's original hire date (the earliest SD for each Id with a Status = 'A'), and another for the most recent hire date for that record (the earliest SD after a record where Status = 'T', or the original hire date).

I can gather the original hire date:

    SELECT Id, SD, ED, Status, Assignment, OriginalHireDate
    FROM EMP t1
    INNER JOIN (
        SELECT Id, MIN(SD) AS OriginalHireDate
        FROM EMP
        WHERE Status = 'A'
        GROUP BY Id
    ) t2 ON t1.Id = t2.Id

But I may be overthinking how to grab each record's most recent hire date. The desired output is:

Id SD ED Status Assignment OriginalHireDate RecentHireDate
1 1/1/2020 1/15/2020 A A1 1/1/2020 1/1/2020
1 1/16/2020 2/1/2020 T null 1/1/2020 1/1/2020
1 2/2/2020 3/20/2021 A null 1/1/2020 2/2/2020
1 3/21/2021 10/1/2022 A B6 1/1/2020 2/2/2020
3 10/15/2022 5/12/2023 A A1 10/15/2022 10/15/2022
2 1/3/2022 2/1/2022 A B2 1/3/2022 1/3/2022
2 2/2/2022 11/1/2023 T null 1/3/2022 1/3/2022
1 10/2/2022 1/1/2023 T null 1/1/2020 2/2/2020
1 1/2/2023 5/12/2023 A A1 1/1/2020 1/2/2023
2 11/2/2023 5/12/2023 A A8 1/3/2022 11/2/2023

Solution

  • I understand that your rows represent employment starts and end, as indicatd by the status column: an employment starts with a "A" status, optionnaly continues with more "A" rows, and then ends with a "T" row.

    We can identify employment streaks by comparing the "last" status with the current one; when it transitions from "T" to "A", a new streak starts. Once the islands are defined, we can compute the beginning with just a window min.

    select e.*,
       min(case when status = 'A' then sd end) over(partition by emp) original_hire_date,
       min(case when status = 'A' then sd end) over(partition by emp, grp) recent_hire_date
    from (
        select e.*, 
            sum(case when status = 'A' and lag_status = 'T' then 1 else 0 end) over(partition by emp order by sd) grp
        from (
            select e.*, lag(status) over(partition by emp order by sd) lag_status
            from emp e
        ) e
    ) e
    order by id, sd
    

    Note that this happily handles series of "A" (as shown in your data) and series of "T", as it only increments on "A"/"T" transitions.

    Id SD ED Status Assignment lag_status grp original_hire_date recent_hire_date
    1 2020-01-01 2020-01-15 A A1 null 0 2020-01-01 2020-01-01
    1 2020-01-16 2020-02-01 T null A 0 2020-01-01 2020-01-01
    1 2020-02-02 2021-03-20 A null T 1 2020-01-01 2020-02-02
    1 2021-03-21 2022-10-01 A B6 A 1 2020-01-01 2020-02-02
    1 2022-10-02 2023-01-01 T null A 1 2020-01-01 2020-02-02
    1 2023-01-02 2023-05-12 A A1 T 2 2020-01-01 2023-01-02
    2 2022-01-03 2022-02-01 A B2 null 0 2022-01-03 2022-01-03
    2 2022-02-02 2023-11-01 T null A 0 2022-01-03 2022-01-03
    2 2023-11-02 2023-05-12 A A8 T 1 2022-01-03 2023-11-02
    3 2022-10-15 2023-05-12 A A1 null 0 2022-10-15 2022-10-15

    fiddle