I have asked 'Gaps and Islands' questions in the past, but this one is significantly different. I have an interesting question in Microsoft SQL Server 2016 database, T-SQL language. (Refer to image files and T-SQL scripts with temp tables)
I have a table called #EmployeeManagersSource (T-SQL scripts added, see image file at the bottom).
CREATE TABLE #EmployeeManagersSource
(
EmployeeName varchar(50),
EmployeeId int,
ManagerId int,
ManagerName varchar(50),
StartDate date,
EndDate date
);
INSERT INTO #EmployeeManagersSource
VALUES
('Andrew', 2367, 311, 'James', '1/1/2017', '1/31/2021'),
('Andrew', 2367, 411, 'Alex', '2/1/2021', '7/3/2021'),
('Andrew', 2367, 555, 'Sam', '7/4/2021', '2/27/2022'),
('Andrew', 2367, 444, 'Martin', '7/5/2021', '2/27/2022'),
('Andrew', 2367, 677, 'Frank', '2/28/2022', '3/5/2022'),
('Andrew', 2367, 977, 'Whitney', '2/28/2022', '3/5/2022'),
('Andrew', 2367, 845, 'Joe', '3/6/2022', '3/15/2022'),
('Andrew', 2367, 652, 'Don', '3/9/2022', '3/12/2022'),
('Andrew', 2367, 559, 'Dan', '3/16/2022', '3/19/2022'),
('Andrew', 2367, 439, 'Autumn', '3/20/2022', '3/24/2022'),
('Andrew', 2367, 567, 'Melissa', '3/20/2022', '3/26/2022'),
('Andrew', 2367, 233, 'Ben', '3/27/2022', '3/30/2022'),
('Andrew', 2367, 399, 'Lisa', '3/31/2022', '4/8/2022'),
('Andrew', 2367, 555, 'Sam', '4/4/2022', '4/13/2022'),
('Andrew', 2367, 677, 'Frank', '4/14/2022', '4/14/2022'),
('Andrew', 2367, 311, 'James', '4/15/2022', '4/30/2022'),
('Andrew', 2367, 439, 'Autumn', '4/19/2022', '4/26/2022'),
('Andrew', 2367, 399, 'Lisa', '4/24/2022', '5/3/2022')
It has a list of employees with their managers. An employee is uniquely identified by the EmployeeId column, and likewise, a manager is uniquely identified by the ManagerId column.
I have a sample of this table, using just one employee with the name Andrew, and his unique identification (i.e. EmployeeId column value ) is 2367. (The #EmployeeManagersSource table has many employees in the real world.)
Andrew has many managers over the years. Andrew works in multiple departments of the company at the same time, so he may report to multiple managers at the same time. If at a point of time, he works only in one department, he will obviously have one manager.
Run the following query in T-SQL to get an idea:
SELECT *
FROM #EmployeeManagersSource
(Ignore the fact that some dates are in the future, the data is fictitious).
I need to develop a dataset as below (T-SQL scripts added, see image file at the bottom):
CREATE TABLE #EmployeeManagersDesiredOutput
(
EmployeeName varchar(50),
EmployeeId int,
ManagerId int,
ManagerName varchar(50),
StartDate date,
EndDate date
);
INSERT INTO #EmployeeManagersDesiredOutput
VALUES
('Andrew', 2367, 311, 'James', '1/1/2017', '1/31/2021'),
('Andrew', 2367, 411, 'Alex', '2/1/2021', '7/3/2021'),
('Andrew', 2367, 555, 'Sam', '7/4/2021', '7/4/2021'),
('Andrew', 2367, 555, 'Sam', '7/5/2021', '2/27/2022'),
('Andrew', 2367, 444, 'Martin', '7/5/2021', '2/27/2022'),
('Andrew', 2367, 677, 'Frank', '2/28/2022', '3/5/2022'),
('Andrew', 2367, 977, 'Whitney', '2/28/2022', '3/5/2022'),
('Andrew', 2367, 845, 'Joe', '3/6/2022', '3/8/2022'),
('Andrew', 2367, 845, 'Joe', '3/9/2022', '3/12/2022'),
('Andrew', 2367, 652, 'Don', '3/9/2022', '3/12/2022'),
('Andrew', 2367, 845, 'Joe', '3/13/2022', '3/15/2022'),
('Andrew', 2367, 559, 'Dan', '3/16/2022', '3/19/2022'),
('Andrew', 2367, 439, 'Autumn', '3/20/2022', '3/24/2022'),
('Andrew', 2367, 567, 'Melissa', '3/20/2022', '3/24/2022'),
('Andrew', 2367, 567, 'Melissa', '3/25/2022', '3/26/2022'),
('Andrew', 2367, 233, 'Ben', '3/27/2022', '3/30/2022'),
('Andrew', 2367, 399, 'Lisa', '3/31/2022', '4/3/2022'),
('Andrew', 2367, 399, 'Lisa', '4/4/2022', '4/8/2022'),
('Andrew', 2367, 555, 'Sam', '4/4/2022', '4/8/2022'),
('Andrew', 2367, 555, 'Sam', '4/9/2022', '4/13/2022'),
('Andrew', 2367, 677, 'Frank', '4/14/2022', '4/14/2022'),
('Andrew', 2367, 311, 'James', '4/15/2022', '4/18/2022'),
('Andrew', 2367, 311, 'James', '4/19/2022', '4/23/2022'),
('Andrew', 2367, 439, 'Autumn', '4/19/2022', '4/23/2022'),
('Andrew', 2367, 311, 'James', '4/24/2022', '4/26/2022'),
('Andrew', 2367, 439, 'Autumn', '4/24/2022', '4/26/2022'),
('Andrew', 2367, 399, 'Lisa', '4/24/2022', '4/26/2022'),
('Andrew', 2367, 311, 'James', '4/27/2022', '4/30/2022'),
('Andrew', 2367, 399, 'Lisa', '4/27/2022', '4/30/2022'),
('Andrew', 2367, 399, 'Lisa', '5/1/2022', '5/3/2022')
SELECT *
FROM #EmployeeManagersDesiredOutput
The lowest level of granularity for time in this scenario is a calendar day.
If Andrew reports to multiple managers on a given calendar day, it must be shown in separate rows with the overlapping period and the range associated. Please run the following queries to get an idea.
SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '7/5/2021'
AND EndDate <= '2/27/2022'
SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '3/9/2022'
AND EndDate <= '3/12/2022'
SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '3/20/2022'
AND EndDate <= '3/24/2022'
SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '4/4/2022'
AND EndDate <= '4/8/2022'
SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '4/19/2022'
AND EndDate <= '4/23/2022'
SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '4/24/2022'
AND EndDate <= '4/26/2022'
SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '4/27/2022'
AND EndDate <= '4/30/2022'
Other rows that do not belong to the overlap must be separated, by adding or subtracting a day.
For example:
SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate = '7/4/2021'
AND EndDate = '7/4/2021'
SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '3/6/2022'
AND EndDate <= '3/8/2022'
SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '3/13/2022'
AND EndDate <= '3/15/2022'
SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '3/31/2022'
AND EndDate <= '4/3/2022'
SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '4/15/2022'
AND EndDate <= '4/18/2022'
SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '5/1/2022'
AND EndDate <= '5/3/2022'
I have the following criteria (for any given employee, such as Andrew) (refer to the #EmployeeManagersSource table):
In case (6) and (7), data can be moved from the #EmployeeManagersSource table to the #EmployeeManagersDesiredOutput without any change.
Any idea on how to convert #EmployeeManagersSource to #EmployeeManagersDesiredOutput ?
My approach:
Develop a temp table (#EmployeeManagersIntermediate) with exploded dates, using the #EmployeeManagersSource table as source:
;
WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1)
,E02(N) AS (SELECT 1 FROM E00 a, E00 b)
,E04(N) AS (SELECT 1 FROM E02 a, E02 b)
,E08(N) AS (SELECT 1 FROM E04 a, E04 b)
,E16(N) AS (SELECT 1 FROM E08 a, E08 b)
,E32(N) AS (SELECT 1 FROM E16 a, E16 b)
,cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E32)
,DateRange AS
(
SELECT ExplodedDate = DATEADD(DAY,N - 1,'1960-01-01')
FROM cteTally
WHERE N <= 365000
)
SELECT EmployeeName, EmployeeId, ManagerId, ManagerName, StartDate, EndDate, CONVERT(date,ExplodedDate) AS ExplodedDate
INTO #EmployeeManagersIntermediate
FROM #EmployeeManagersSource eh
JOIN DateRange d ON d.ExplodedDate >= eh.[StartDate]
AND d.ExplodedDate <= eh.[EndDate];
SELECT *
FROM #EmployeeManagersIntermediate
WHERE ManagerName = 'Lisa'
However, am not able to get the #EmployeeManagersDesiredOutput using #EmployeeManagersIntermediate, based on this 'Gaps and Islands' problem here.
I think I need a proper PARTITION BY clause. Can someone suggest a solution on how change #EmployeeManagersIntermediate to #EmployeeManagersDesiredOutput ?
You can use the following query:
;WITH Dates AS (
SELECT EmployeeId, EmployeeName, SomeDate,
ROW_NUMBER() OVER (PARTITION BY x.EmployeeId ORDER BY x.SomeDate) AS RowNum
FROM (
SELECT EmployeeId, EmployeeName, StartDate AS SomeDate
FROM #EmployeeManagersSource
UNION
SELECT EmployeeId, EmployeeName, DATEADD(DAY,1,EndDate)
FROM #EmployeeManagersSource
) x
), Intervals AS (
SELECT d1.EmployeeId, d1.EmployeeName,
d1.SomeDate AS StartDate, DATEADD(DAY,-1,d2.SomeDate) AS EndDate
FROM Dates d1
INNER JOIN Dates d2
ON d2.EmployeeId = d1.EmployeeId AND d1.RowNum=d2.RowNum-1
)
SELECT i.EmployeeName, i.EmployeeId, s.ManagerId, s.ManagerName, i.StartDate, i.EndDate
FROM Intervals i
INNER JOIN #EmployeeManagersSource s
ON s.EmployeeId = i.EmployeeId AND s.StartDate<=i.StartDate AND s.EndDate>=i.EndDate
The first CTE builds a list of dates when changes appear, the second CTE builds the intervals, the final query finds the mangers for each interval.