Search code examples
sqlsql-servert-sqlviewminimum

View to get the minimum date with a complicated condition


I have a table in SQL Server like this:

+----------+-----------+------------+
| DateFrom | Completed | EmployeeID |
+----------+-----------+------------+
DateFrom: date not null -- unique for each EmployeeID
Completed: bit not null
EmployeeID: bigint not null
  • Each row belongs to a subperiod defined by a start date and can be completed or not.
  • Each employee can have multiple subperiods.
  • A period is defined by a list of ordered subperiods until the last subperiod is completed.

I want to create a view that will return the start date of the last period for every EmployeeID like this:

  1. If there is no Completed is true, get the minimum DateFrom. [The employee has one period which is still not completed]
+----------+-----------+------------+
| DateFrom | Completed | EmployeeID |
+----------+-----------+------------+
|2021-01-01|   false   |     1      |
|2021-01-05|   false   |     1      |
|2021-01-09|   false   |     1      |
|2021-01-10|   false   |     1      |
|2021-01-07|   false   |     2      |
|2021-01-15|   false   |     2      |
+----------+-----------+------------+

Expected Result:
2021-01-01 for EmployeeID = 1
2021-01-07 for EmployeeID = 2
  1. Else, return the minimum DateFrom after the last Completed is true. [The last period is still not completed]
+----------+-----------+------------+
| DateFrom | Completed | EmployeeID |
+----------+-----------+------------+
|2021-01-01|   false   |     1      |
|2021-01-05|   true    |     1      |
|2021-01-09|   false   |     1      |
|2021-01-10|   false   |     1      |
|2021-01-07|   true    |     2      |
|2021-01-15|   false   |     2      |
+----------+-----------+------------+

Expected Result:
2021-01-09 for EmployeeID = 1
2021-01-15 for EmployeeID = 2
  1. If the maximum DateFrom has Completed=true, return the minimum DateFrom before the last Completed is true and after the true before it, if exists. [The last period is completed with multiple subperiods]
+----------+-----------+------------+
| DateFrom | Completed | EmployeeID |
+----------+-----------+------------+
|2021-01-01|   false   |     1      |
|2021-01-05|   true    |     1      |
|2021-01-09|   false   |     1      |
|2021-01-10|   true    |     1      |
|2021-01-07|   false   |     2      |
|2021-01-15|   true    |     2      |
+----------+-----------+------------+

Expected Result:
2021-01-09 for EmployeeID = 1
2021-01-07 for EmployeeID = 2
  1. If the maximum DateFrom has Completed=true and there is no other rows or the row before it has Completed=true, return the maximum DateFrom. [The last period is completed with one subperiod]
+----------+-----------+------------+
| DateFrom | Completed | EmployeeID |
+----------+-----------+------------+
|2021-01-01|   false   |     1      |
|2021-01-05|   false   |     1      |
|2021-01-09|   true    |     1      |
|2021-01-10|   true    |     1      |
|2021-01-07|   true    |     2      |
+----------+-----------+------------+

Expected Result:
2021-01-10 for EmployeeID = 1
2021-01-07 for EmployeeID = 2

I am looking for the most optimized solution.

I tried this, but I get a NULL value in the third example:

WITH T AS (
    SELECT EmployeeID
        , MAX(CASE WHEN Completed = 0 THEN NULL ELSE DateFrom END) MaxDateFrom 
    FROM TableDates
    GROUP BY EmployeeID
)
SELECT TableDates.EmployeeID, MIN(TableDates.DateFrom) DateFrom
FROM T
LEFT JOIN TableDates ON T.EmployeeID = TableDates.EmployeeID
    AND (T.MaxDateFrom IS NULL OR TableDates.DateFrom > T.MaxDateFrom)
GROUP BY TableDates.EmployeeID

Solution

  • I think you just want conditional aggregation -- with a bunch of logic. Assuming that you have rows for every day, I think this does what you want:

    select employeeid,
           (case when -- case 4
                      min(completed) = max(completed) and
                      min(completed) = 'true'
                 then max(datefrom) 
                 when -- case 1
                      min(completed) = max(completed) and
                      min(completed) = 'false'
                 then min(datefrom) 
                 when -- case 3
                      max(datefrom) = max(case when completed = 'true' then datefrom end)
                 then min(case when completed_seqnum = 1 then datefrom end)
                 else dateadd(day, 1, max(case when completed = 'true' then datefrom end))
            end)
    from (select t.*,
                 sum(case when completed = 'true' then 1 else 0 end) over (partition by employeeid order by datefrom desc) as completed_seqnum
          from t
         ) t
    group by employeeid;
    

    The need for a row every day is really just a convenience -- allowing the code to add one day to get the date after a particular "true" false, for instance. This could also be accomplished using lead() in the subquery.

    Note: This does not handle all conditions (at least with a non-NULL date. For instance, it returns NULL when there is a sequence of "true"s at the end of the data.

    If this is an issue -- this version of your question has been asked. Ask a new question with appropriate sample data and desired results. I also think that you might be able to explain the problem you are trying to solve and simplify the explanation.

    EDIT:

    If dates are missing, you can use:

    select employeeid,
           (case when -- case 4
                      min(completed) = max(completed) and
                      min(completed) = 'true'
                 then max(datefrom) 
                 when -- case 1
                      min(completed) = max(completed) and
                      min(completed) = 'false'
                 then min(datefrom) 
                 when -- case 3
                      max(datefrom) = max(case when completed = 'true' then datefrom end)
                 then min(case when completed_seqnum = 1 then datefrom end)
                 else max(case when completed = 'true' then next_datefrom end)
            end)
    from (select t.*,
                 lead(datefrom) over (partition by employeeid order by datefrom) as next_datefrom,
                 sum(case when completed = 'true' then 1 else 0 end) over (partition by employeeid order by datefrom desc) as completed_seqnum
          from t
         ) t
    group by employeeid;