Search code examples
sql-serverfactdirty-data

Consolidating fact records by timestamp


I am in a Microsoft SQL Server database. I would like to consolidate start and end periods for employees. For example, given the following simple table:

create table dbo.foo (
employee_key bigint,
effective_start_date date,
effective_end_date date,
account_name varchar(100));


insert into foo (employee_key,
effective_start_date,
effective_end_date,
account_name)
values (1
    ,'2017-01-01'
    ,'2017-01-31'
    ,'Google')
,(1
    ,'2017-02-01'
    ,'2017-02-28'
    ,'Apple')
,(1
    ,'2017-03-01'
    ,'2017-03-31'
    ,'Google')
,(1
    ,'2017-04-01'
    ,'9999-12-31'
    ,'Google')

The employee has moved accounts a couple times. I would like to maintain the change in February and back again. However, I do not want to see 2 records at the end. Instead, I want the tenure at Google to run from 2017-03-01 to 9999-12-31.

By the way, this is one example. That is, the "broken" records aren't always at the end of the employee's tenure.

Expected results:

employee_key | effective_start_date | effective_end_date | account_name
1            | 2017-01-01           | 2017-01-31         | Google
1            | 2017-02-01           | 2017-02-28         | Apple
1            | 2017-03-01           | 9999-12-31         | Google

Solution

  • You can use row_number and get this results... I used two subqueries but it can be solved without that as well:

    ;With Cte as (
    select *, Bucket = sum(accn) over(partition by employee_key order by effective_start_date) 
        from (
            Select *,case when account_name <> lag(account_name) over(partition by employee_key order by effective_start_date) 
                then 1 else 0 end as AccN
                from foo ) A
        )
        Select top (1) with ties employee_key, Min(effective_start_date) over(partition by employee_key,Bucket) AS [effective_start_date],      
                    Max(effective_end_date) over(partition by employee_key,Bucket) AS [effective_end_date], account_name
        from cte
        order by row_number() over(partition by employee_key, bucket order by effective_start_date)
    

    Output as below:

    +--------------+----------------------+--------------------+--------------+
    | employee_key | Effective_start_Date | Effective_End_date | account_name |
    +--------------+----------------------+--------------------+--------------+
    |            1 | 2017-01-01           | 2017-01-31         | Google       |
    |            1 | 2017-02-01           | 2017-02-28         | Apple        |
    |            1 | 2017-03-01           | 9999-12-31         | Google       |
    +--------------+----------------------+--------------------+--------------+
    

    Demo