Search code examples
sqlsql-serverdatedate-range

SQL Query that returns all missing months for date ranges within a year (Microsoft SQL Server)


An employee is hired for different departments. For example, they could be working in Marketing and be paid by Department 0001 and also Department 0002 depending on the hours they do. For this question, hours doesn't matter as I truly only need to find the date ranges when the employee is NOT hired. The below two employees are examples of those.

Employee_ID Department_ID Start_Date End_Date
395961 0001 01/01/2022 03/31/2022
395961 0002 01/01/2022 01/31/2022
395961 0001 06/01/2022 10/31/2022
395961 0001 11/01/2022 12/31/2022
395962 0001 01/01/2022 03/31/2022
395962 0002 01/01/2022 01/31/2022
395962 0002 02/01/2022 04/30/2022
395962 0001 06/01/2022 10/31/2022
395962 0001 11/01/2022 12/31/2022
395963 0001 01/01/2022 05/31/2022
395963 0002 06/01/2022 12/31/2022

Employee_ID 395961 and 395962 should return for the query as both have date range gaps. 395961 does not have a date range from 04/01/2022 until 05/31/2022. 395962 does not have a date range from 05/01/2022 until 05/31/2022. 395963 does NOT have a date range gap.

I had in mind to return the months the employee had date ranges and group them by month as we can get duplicate returns. For example, Employee_ID 395961 would return Jan, Feb, March, Jan, Jun, Jul, Aug, Sep, Oct, Nov, Dec and when we group the months, it would remove the duplicate Jan. Then we check which months are missing and if anything returns, we should consider to return this Employee_ID as result.

I truly have NOT figured out how to get a correct result, but I've done some research to find the below that could help. I wouldn't want to use variables as I want this query applied to 1000's of records.

SELECT  DATENAME(MONTH, DATEADD(MONTH, x.number, @Start_Date)) AS MonthName
FROM    master.dbo.spt_values x
WHERE   x.type = 'P'        
AND     x.number <= DATEDIFF(MONTH, @Start_Date, @End_Date)

Edit: The database is Microsoft SQL Server 2017.


Solution

  • I think this should give you what you are looking for:

    WITH cal (date) AS (
        SELECT @Start_Date
        UNION ALL
        SELECT DATEADD(day, 1, date)
        FROM cal
        WHERE date < @End_Date
    )
    SELECT e.Employee_ID, COUNT(cal.date) AS Days_Not_Hired
    FROM cal
    CROSS JOIN (SELECT DISTINCT Employee_ID FROM employees) e
    LEFT JOIN employees e2
        ON cal.date BETWEEN e2.Start_Date AND e2.End_Date
        AND e.Employee_ID = e2.Employee_ID
    WHERE e2.Employee_ID IS NULL
    GROUP BY e.Employee_ID
    OPTION(MAXRECURSION 365);
    

    db<>fiddle

    This is targeted at SQL Server. It may need tweaking for other dialects.


    To run for all employees for all time you could do:

    WITH cal (date) AS (
        SELECT MIN(Start_Date) FROM employees
        UNION ALL
        SELECT DATEADD(day, 1, date)
        FROM cal
        WHERE date < CAST( GETDATE() AS DATE )
    )
    SELECT e.*, COUNT(cal.date) AS Days_Not_Hired
    FROM cal
    JOIN (
        SELECT
            Employee_ID,
            MIN(Start_Date) AS Min_Start_Date,
            MAX(End_Date) AS Max_End_Date
        FROM employees
        GROUP BY Employee_ID
    ) e
        ON cal.date BETWEEN e.Min_Start_Date AND e.Max_End_Date
    LEFT JOIN employees e2
        ON cal.date BETWEEN e2.Start_Date AND e2.End_Date
        AND e.Employee_ID = e2.Employee_ID
    WHERE e2.Employee_ID IS NULL
    GROUP BY e.Employee_ID, e.Min_Start_Date, e.Max_End_Date
    OPTION(MAXRECURSION 0);
    

    db<>fiddle

    I very rarely use SQL Server so I have no idea how its optimiser is going to deal with this query.