Search code examples
sql-servert-sqlwindow-functionsgaps-and-islands

T-SQL: Find maximum concurrent number of people per day with null values


I have a table that stores people's sign-ins and sign-outs in rooms. One of them logs in and tells the system how many people they are in total. As long as they don't log out, the value in the LogOut column remains NULL. This is a query for a report I have to do but I am stuck.

I want to find the maximum concurrent number of people per day in each room in a given time span. However, I also want to count the number of people that are NULL, since they are actually still logged in.

I created this query based on several answers on Stackoverflow. Unfortunately I haven't figured out how to get the number of people with NULL values ​​on a day when no one logs into that room. The reason might be the cumulative total, but how do I work around this issue?

Here is an example table:

CREATE TABLE Registrations(
    [RoomName] VARCHAR(32),
    [LogIn] DATETIME,
    [LogOut] DATETIME,
    [NumberOfPeople] INT
)

INSERT INTO Registrations (RoomName, LogIn, LogOut, NumberOfPeople)
VALUES
('MainHall', '2022-01-31 08:00:00.000', NULL, 5),
('MainHall', '2022-01-31 08:00:00.000', '2022-01-31 10:00:00.000', 5),
('MainHall', '2022-01-31 09:00:00.000', '2022-01-31 11:00:00.000', 2),
('MainHall', '2022-01-31 12:00:00.000', '2022-01-31 14:00:00.000', 2),
('SmallRoom', '2022-01-31 08:00:00.000', NULL, 3),
('SmallRoom', '2022-01-31 10:00:00.000', '2022-01-31 12:00:00.000', 3),
('SmallRoom', '2022-01-31 11:00:00.000', '2022-01-31 14:00:00.000', 3),
('MainHall', '2022-02-01 08:00:00.000', NULL, 5),
('MainHall', '2022-02-01 08:00:00.000', '2022-02-01 10:00:00.000', 5),
('MainHall', '2022-02-01 09:00:00.000', '2022-02-01 11:00:00.000', 2),
('MainHall', '2022-02-01 12:00:00.000', '2022-02-01 14:00:00.000', 2)

This is how the first entries of MainHall on 2022-01-31 should look like:

-- A    |-------------------
-- B    |-----|
-- C       |-----|
-- D                |-----|

This is the result I want to get:

Day         |    Room     | MaxNumberOfConcurrentLoggedInPeople 
------------+-------------+-------------------------------------
2022-01-31  |  MainHall   |                 12 
2022-01-31  |  SmallRoom  |                 9
2022-02-01  |  MainHall   |                 17
2022-02-01  |  SmallRoom  |                 3

This is my query:

SELECT  ct.RoomName,
        ct.DatePeak,
        ct.RegistrationPeak
FROM
(
    SELECT  cte.RoomName,
            CONVERT(DATE, cte.DatePeak) AS DatePeak,
            MAX(cte.RegistrationPeak) AS RegistrationPeak
    FROM
        (
        SELECT  MarkedDateTime.RoomName,
                SUM(SUM(MarkedDateTime.NumberOfPeople)) OVER (PARTITION BY MarkedDateTime.RoomName ORDER BY MarkedDateTime.DatePeak) AS RegistrationPeak,
                CONVERT(DATE, MarkedDateTime.DatePeak) AS DatePeak
        FROM
            (
                SELECT      RoomName,
                            NumberOfPeople,
                            LogIn AS DatePeak

                FROM        Registrations

                WHERE       (LogOut >=  '2022-01-31 00:00:00.000' OR LogOut IS NULL)
                AND         LogIn < DATEADD(DAY, 1, '2022-02-01 00:00:00.000')

                UNION ALL

                SELECT      RoomName,
                            -NumberOfPeople,
                            COALESCE(LogOut, DATEADD(MILLISECOND, - 3, DATEADD(DAY, 1, '2022-02-01 00:00:00.000'))) AS DatePeak

                FROM        Registrations

                WHERE       (LogOut >=  '2022-01-31 00:00:00.000' OR LogOut IS NULL)
                AND         LogIn < DATEADD(DAY, 1, '2022-02-01 00:00:00.000')
            ) AS MarkedDateTime

        GROUP BY RoomName, DatePeak
        ) AS cte

    GROUP BY DatePeak, RoomName
    ) AS ct

WHERE   ct.DatePeak <= '2022-02-01 00:00:00.000'
AND     ct.DatePeak >= '2022-01-31 00:00:00.000'

EDIT: I added the solution and a small visual help. 2. EDIT: Fixed the wrong dates in the provided test table.


Solution

  • Your data doesn't seem to quite match your expected output, and some of the data doesn't even make sense (logouts before logins), but I think you can solve it like this:

    • Unpivot the log-ins and -outs into separate events, with the quantity positive or negative respectively. Exclude nulls.
    • Calculate a running sum of the quantity, partitioned by room and date, ordered by time.
    • Then simply group by room and day, and take the maximum of the running sum.
    SELECT
      evnt.Day,
      evnt.RoomName,
      MaxNumberOfConcurrentLoggedInPeople = MAX(evnt.RunningSum)
    FROM (
        SELECT
          r.RoomName,
          Day = CAST(v.EventDateTime AS date),
          v.DiffQuantity,
          RunningSum = SUM(v.DiffQuantity) OVER (PARTITION BY r.RoomName, CAST(v.EventDateTime AS date) ORDER BY v.EventDateTime ROWS UNBOUNDED PRECEDING)
        FROM Registrations r
        CROSS APPLY (VALUES
            (r.LogIn, r.NumberOfPeople),
            (r.LogOut, -r.NumberOfPeople)
        ) v(EventDateTime, DiffQuantity)
        WHERE v.EventDateTime IS NOT NULL
    ) evnt
    GROUP BY
      evnt.RoomName,
      evnt.Day;
    

    db<>fiddle