Search code examples
datet-sqlsql-server-2016gaps-and-islandspartition-by

Microsoft SQL Server 2016 - T-SQL puzzle - overlapping date ranges in segregated rows - 'Gaps & Islands' problem


I have asked 'Gaps and Islands' questions in the past, but this one is significantly different. I have an interesting question in Microsoft SQL Server 2016 database, T-SQL language. (Refer to image files and T-SQL scripts with temp tables)

I have a table called #EmployeeManagersSource (T-SQL scripts added, see image file at the bottom).

CREATE TABLE #EmployeeManagersSource

(

EmployeeName varchar(50),
EmployeeId int,
ManagerId int,
ManagerName varchar(50),
StartDate date,
EndDate date

);



INSERT INTO #EmployeeManagersSource

VALUES

('Andrew',  2367,   311,    'James',   '1/1/2017',  '1/31/2021'),
('Andrew',  2367,   411,    'Alex',    '2/1/2021',  '7/3/2021'),
('Andrew',  2367,   555,    'Sam',     '7/4/2021',  '2/27/2022'),
('Andrew',  2367,   444,    'Martin',  '7/5/2021',  '2/27/2022'),
('Andrew',  2367,   677,    'Frank',   '2/28/2022', '3/5/2022'),
('Andrew',  2367,   977,    'Whitney', '2/28/2022', '3/5/2022'),
('Andrew',  2367,   845,    'Joe',     '3/6/2022',  '3/15/2022'),
('Andrew',  2367,   652,    'Don',     '3/9/2022',  '3/12/2022'),
('Andrew',  2367,   559,    'Dan',     '3/16/2022', '3/19/2022'),
('Andrew',  2367,   439,    'Autumn',  '3/20/2022', '3/24/2022'),
('Andrew',  2367,   567,    'Melissa', '3/20/2022', '3/26/2022'),
('Andrew',  2367,   233,    'Ben',     '3/27/2022', '3/30/2022'),
('Andrew',  2367,   399,    'Lisa',    '3/31/2022', '4/8/2022'),
('Andrew',  2367,   555,    'Sam',     '4/4/2022',  '4/13/2022'),
('Andrew',  2367,   677,    'Frank',   '4/14/2022', '4/14/2022'),
('Andrew',  2367,   311,    'James',   '4/15/2022', '4/30/2022'),
('Andrew',  2367,   439,    'Autumn',  '4/19/2022', '4/26/2022'),
('Andrew',  2367,   399,    'Lisa',    '4/24/2022', '5/3/2022')

It has a list of employees with their managers. An employee is uniquely identified by the EmployeeId column, and likewise, a manager is uniquely identified by the ManagerId column.

I have a sample of this table, using just one employee with the name Andrew, and his unique identification (i.e. EmployeeId column value ) is 2367. (The #EmployeeManagersSource table has many employees in the real world.)

Andrew has many managers over the years. Andrew works in multiple departments of the company at the same time, so he may report to multiple managers at the same time. If at a point of time, he works only in one department, he will obviously have one manager.

Run the following query in T-SQL to get an idea:

SELECT *
FROM #EmployeeManagersSource

(Ignore the fact that some dates are in the future, the data is fictitious).

I need to develop a dataset as below (T-SQL scripts added, see image file at the bottom):

CREATE TABLE #EmployeeManagersDesiredOutput

(

EmployeeName varchar(50),
EmployeeId int,
ManagerId int,
ManagerName varchar(50),
StartDate date,
EndDate date

);


INSERT INTO #EmployeeManagersDesiredOutput

VALUES

('Andrew',  2367,   311,    'James',   '1/1/2017',  '1/31/2021'),
('Andrew',  2367,   411,    'Alex',    '2/1/2021',  '7/3/2021'),
('Andrew',  2367,   555,    'Sam',     '7/4/2021',  '7/4/2021'),
('Andrew',  2367,   555,    'Sam',     '7/5/2021',  '2/27/2022'),
('Andrew',  2367,   444,    'Martin',  '7/5/2021',  '2/27/2022'),
('Andrew',  2367,   677,    'Frank',   '2/28/2022', '3/5/2022'),
('Andrew',  2367,   977,    'Whitney', '2/28/2022', '3/5/2022'),
('Andrew',  2367,   845,    'Joe',     '3/6/2022',  '3/8/2022'),
('Andrew',  2367,   845,    'Joe',     '3/9/2022',  '3/12/2022'),
('Andrew',  2367,   652,    'Don',     '3/9/2022',  '3/12/2022'),
('Andrew',  2367,   845,    'Joe',     '3/13/2022', '3/15/2022'),
('Andrew',  2367,   559,    'Dan',     '3/16/2022', '3/19/2022'),
('Andrew',  2367,   439,    'Autumn',  '3/20/2022', '3/24/2022'),
('Andrew',  2367,   567,    'Melissa', '3/20/2022', '3/24/2022'),
('Andrew',  2367,   567,    'Melissa', '3/25/2022', '3/26/2022'),
('Andrew',  2367,   233,    'Ben',     '3/27/2022', '3/30/2022'),
('Andrew',  2367,   399,    'Lisa',    '3/31/2022', '4/3/2022'),
('Andrew',  2367,   399,    'Lisa',    '4/4/2022',  '4/8/2022'),
('Andrew',  2367,   555,    'Sam',     '4/4/2022',  '4/8/2022'),
('Andrew',  2367,   555,    'Sam',     '4/9/2022',  '4/13/2022'),
('Andrew',  2367,   677,    'Frank',   '4/14/2022', '4/14/2022'),
('Andrew',  2367,   311,    'James',   '4/15/2022', '4/18/2022'),
('Andrew',  2367,   311,    'James',   '4/19/2022', '4/23/2022'),
('Andrew',  2367,   439,    'Autumn',  '4/19/2022', '4/23/2022'),
('Andrew',  2367,   311,    'James',   '4/24/2022', '4/26/2022'),
('Andrew',  2367,   439,    'Autumn',  '4/24/2022', '4/26/2022'),
('Andrew',  2367,   399,    'Lisa',    '4/24/2022', '4/26/2022'),
('Andrew',  2367,   311,    'James',   '4/27/2022', '4/30/2022'),
('Andrew',  2367,   399,    'Lisa',    '4/27/2022', '4/30/2022'),
('Andrew',  2367,   399,    'Lisa',    '5/1/2022',  '5/3/2022')

SELECT *
FROM #EmployeeManagersDesiredOutput

The lowest level of granularity for time in this scenario is a calendar day.

If Andrew reports to multiple managers on a given calendar day, it must be shown in separate rows with the overlapping period and the range associated. Please run the following queries to get an idea.

SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '7/5/2021'
AND EndDate <= '2/27/2022'

SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '3/9/2022'
AND EndDate <= '3/12/2022'

SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '3/20/2022'
AND EndDate <= '3/24/2022'

SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '4/4/2022'
AND EndDate <= '4/8/2022'

SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '4/19/2022'
AND EndDate <= '4/23/2022'

SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '4/24/2022'
AND EndDate <= '4/26/2022'

SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '4/27/2022'
AND EndDate <= '4/30/2022'

Other rows that do not belong to the overlap must be separated, by adding or subtracting a day.

For example:

SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate = '7/4/2021'
AND EndDate = '7/4/2021'


SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '3/6/2022'
AND EndDate <= '3/8/2022'

SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '3/13/2022'
AND EndDate <= '3/15/2022'

SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '3/31/2022'
AND EndDate <= '4/3/2022'


SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '4/15/2022'
AND EndDate <= '4/18/2022'

SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '5/1/2022'
AND EndDate <= '5/3/2022'

I have the following criteria (for any given employee, such as Andrew) (refer to the #EmployeeManagersSource table):

  1. StartDate is the same for multiple managers (eg: Autumn and Melissa, Mar 20,2022), but EndDate varies
  2. StartDate varies for multiple managers, but EndDate is the same for those managers (eg: Sam and Martin, Feb 27, 2022).
  3. Partial overlap between StartDate and EndDate between multiple managers. (eg: Lisa (Mar 31, 2022 to Apr 8, 2022) and Sam (Apr 4, 2022 to Apr 13, 2022))
  4. StartDate and EndDate of a manager are entirely a subset of those of another manager. (eg: Joe (Mar 6, 2022 to Mar 15, 2022) and Don (Mar 9, 2022 to Mar 12, 2022))
  5. Sometimes even 3 managers handle at the same time (eg: James, Autumn, Lisa (Apr 24, 2022 to Apr 26, 2022))
  6. StartDate and EndDate of a manager exactly matches another manager (eg: Frank and Whitney, Feb 28, 2022 to Mar 5, 2022)
  7. Just one manager in a period (eg: Dan, Mar 16, 2022 to Mar 19, 2022)

In case (6) and (7), data can be moved from the #EmployeeManagersSource table to the #EmployeeManagersDesiredOutput without any change.

Any idea on how to convert #EmployeeManagersSource to #EmployeeManagersDesiredOutput ?

enter image description here

enter image description here

My approach:

Develop a temp table (#EmployeeManagersIntermediate) with exploded dates, using the #EmployeeManagersSource table as source:

;
WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1)
    ,E02(N) AS (SELECT 1 FROM E00 a, E00 b)
    ,E04(N) AS (SELECT 1 FROM E02 a, E02 b)
    ,E08(N) AS (SELECT 1 FROM E04 a, E04 b)
    ,E16(N) AS (SELECT 1 FROM E08 a, E08 b)
    ,E32(N) AS (SELECT 1 FROM E16 a, E16 b)
    ,cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E32)
    ,DateRange AS
(
    SELECT ExplodedDate = DATEADD(DAY,N - 1,'1960-01-01')
    FROM cteTally
    WHERE N <= 365000
)
SELECT EmployeeName, EmployeeId, ManagerId, ManagerName, StartDate, EndDate, CONVERT(date,ExplodedDate) AS ExplodedDate
INTO #EmployeeManagersIntermediate
FROM #EmployeeManagersSource eh
JOIN DateRange d ON d.ExplodedDate >= eh.[StartDate]
 AND d.ExplodedDate <= eh.[EndDate];
 

 SELECT *
 FROM #EmployeeManagersIntermediate
 WHERE ManagerName = 'Lisa'

However, am not able to get the #EmployeeManagersDesiredOutput using #EmployeeManagersIntermediate, based on this 'Gaps and Islands' problem here.

I think I need a proper PARTITION BY clause. Can someone suggest a solution on how change #EmployeeManagersIntermediate to #EmployeeManagersDesiredOutput ?


Solution

  • You can use the following query:

    ;WITH Dates AS (
        SELECT EmployeeId, EmployeeName, SomeDate,
            ROW_NUMBER() OVER (PARTITION BY x.EmployeeId ORDER BY x.SomeDate) AS RowNum
        FROM (
            SELECT EmployeeId, EmployeeName, StartDate AS SomeDate
            FROM #EmployeeManagersSource
            UNION
            SELECT EmployeeId, EmployeeName, DATEADD(DAY,1,EndDate)
            FROM #EmployeeManagersSource
        ) x
    ), Intervals AS (
        SELECT d1.EmployeeId, d1.EmployeeName, 
            d1.SomeDate AS StartDate, DATEADD(DAY,-1,d2.SomeDate) AS EndDate
        FROM Dates d1 
        INNER JOIN Dates d2 
        ON d2.EmployeeId = d1.EmployeeId AND d1.RowNum=d2.RowNum-1
    )
    SELECT i.EmployeeName, i.EmployeeId, s.ManagerId, s.ManagerName, i.StartDate, i.EndDate
    FROM Intervals i
    INNER JOIN #EmployeeManagersSource s 
    ON s.EmployeeId = i.EmployeeId AND s.StartDate<=i.StartDate AND s.EndDate>=i.EndDate
    

    The first CTE builds a list of dates when changes appear, the second CTE builds the intervals, the final query finds the mangers for each interval.