I have a table in SQL Server like this:
+----------+-----------+------------+
| DateFrom | Completed | EmployeeID |
+----------+-----------+------------+
DateFrom: date not null -- unique for each EmployeeID
Completed: bit not null
EmployeeID: bigint not null
I want to create a view that will return the start date of the last period for every EmployeeID like this:
+----------+-----------+------------+
| DateFrom | Completed | EmployeeID |
+----------+-----------+------------+
|2021-01-01| false | 1 |
|2021-01-05| false | 1 |
|2021-01-09| false | 1 |
|2021-01-10| false | 1 |
|2021-01-07| false | 2 |
|2021-01-15| false | 2 |
+----------+-----------+------------+
Expected Result:
2021-01-01 for EmployeeID = 1
2021-01-07 for EmployeeID = 2
+----------+-----------+------------+
| DateFrom | Completed | EmployeeID |
+----------+-----------+------------+
|2021-01-01| false | 1 |
|2021-01-05| true | 1 |
|2021-01-09| false | 1 |
|2021-01-10| false | 1 |
|2021-01-07| true | 2 |
|2021-01-15| false | 2 |
+----------+-----------+------------+
Expected Result:
2021-01-09 for EmployeeID = 1
2021-01-15 for EmployeeID = 2
+----------+-----------+------------+
| DateFrom | Completed | EmployeeID |
+----------+-----------+------------+
|2021-01-01| false | 1 |
|2021-01-05| true | 1 |
|2021-01-09| false | 1 |
|2021-01-10| true | 1 |
|2021-01-07| false | 2 |
|2021-01-15| true | 2 |
+----------+-----------+------------+
Expected Result:
2021-01-09 for EmployeeID = 1
2021-01-07 for EmployeeID = 2
+----------+-----------+------------+
| DateFrom | Completed | EmployeeID |
+----------+-----------+------------+
|2021-01-01| false | 1 |
|2021-01-05| false | 1 |
|2021-01-09| true | 1 |
|2021-01-10| true | 1 |
|2021-01-07| true | 2 |
+----------+-----------+------------+
Expected Result:
2021-01-10 for EmployeeID = 1
2021-01-07 for EmployeeID = 2
I am looking for the most optimized solution.
I tried this, but I get a NULL value in the third example:
WITH T AS (
SELECT EmployeeID
, MAX(CASE WHEN Completed = 0 THEN NULL ELSE DateFrom END) MaxDateFrom
FROM TableDates
GROUP BY EmployeeID
)
SELECT TableDates.EmployeeID, MIN(TableDates.DateFrom) DateFrom
FROM T
LEFT JOIN TableDates ON T.EmployeeID = TableDates.EmployeeID
AND (T.MaxDateFrom IS NULL OR TableDates.DateFrom > T.MaxDateFrom)
GROUP BY TableDates.EmployeeID
I think you just want conditional aggregation -- with a bunch of logic. Assuming that you have rows for every day, I think this does what you want:
select employeeid,
(case when -- case 4
min(completed) = max(completed) and
min(completed) = 'true'
then max(datefrom)
when -- case 1
min(completed) = max(completed) and
min(completed) = 'false'
then min(datefrom)
when -- case 3
max(datefrom) = max(case when completed = 'true' then datefrom end)
then min(case when completed_seqnum = 1 then datefrom end)
else dateadd(day, 1, max(case when completed = 'true' then datefrom end))
end)
from (select t.*,
sum(case when completed = 'true' then 1 else 0 end) over (partition by employeeid order by datefrom desc) as completed_seqnum
from t
) t
group by employeeid;
The need for a row every day is really just a convenience -- allowing the code to add one day to get the date after a particular "true" false, for instance. This could also be accomplished using lead()
in the subquery.
Note: This does not handle all conditions (at least with a non-NULL date. For instance, it returns NULL
when there is a sequence of "true"s at the end of the data.
If this is an issue -- this version of your question has been asked. Ask a new question with appropriate sample data and desired results. I also think that you might be able to explain the problem you are trying to solve and simplify the explanation.
EDIT:
If dates are missing, you can use:
select employeeid,
(case when -- case 4
min(completed) = max(completed) and
min(completed) = 'true'
then max(datefrom)
when -- case 1
min(completed) = max(completed) and
min(completed) = 'false'
then min(datefrom)
when -- case 3
max(datefrom) = max(case when completed = 'true' then datefrom end)
then min(case when completed_seqnum = 1 then datefrom end)
else max(case when completed = 'true' then next_datefrom end)
end)
from (select t.*,
lead(datefrom) over (partition by employeeid order by datefrom) as next_datefrom,
sum(case when completed = 'true' then 1 else 0 end) over (partition by employeeid order by datefrom desc) as completed_seqnum
from t
) t
group by employeeid;