Search code examples
sqlt-sql

Calculate days in between status change


I'm not sure how to code to get the result I'm looking for. I hope the community can point me in the right direction. I need to run a report against the Rent History table, and I expect the report to look like the Final Result. Please see the image for details. enter image description here There are three tags in the Rent History table. I was able to use ROW_NUMBER() OVER (PARTITION BY) to determine the On/Off Rent start date based on the position. This is where I stopped because I'm not sure what to do next. So far, this is the code that I have.

SELECT tag
,progress_complete_date_time
,create_date
,required_until_date
,modify_date
,on_rent_status
,[date_on-off_rent]
,position
,CASE 
    WHEN position = 1
        THEN [date_on-off_rent]
    END AS [first_time_on-off_rent]
,RANK() OVER(PARTITION BY tag  ORDER BY modify_date) AS [RANK]
FROM CleanUpB

The DateOnRent and DateOffRent columns are based on how the tags start out. Some tags start with DateOnRent and some with DateOffRent. I only need to capture the start and end status of on_rent_status. Here is my formula to calculate DaysOnRent and DaysOffRent.

TAG-000479 DaysOnRent DATEDIFF(DAY,'8/12/2021','4/4/2024') = 966 DATEDIFF(DAY,'04/04/2024','06/10/2024') = 67

TAG-000479 DaysOffRent DATEDIFF(DAY,'04/04/2024','04/04/2024') = 0 DATEDIFF(DAY,'06/10/2024',GETDATE()) = 38

TAG-001832 DaysOnRent DATEDIFF(DAY,'03/28/2024',GETDATE()) = 112

TAG-003420 DaysOnRent DATEDIFF(DAY,'06/24/2024',GETDATE()) = 24


Solution

  • Updated version

    Based on feedback from OP. This is not just a simple min/max situation, it's a gaps and islands problem. In this case, we have to deal with multiple "on_rent" records and potentially multiple "off_rent" records as well as gaps in between "off_rent" and "on_rent".

    This is my solution so far (see bottom of response to populate sample data table):

    WITH cte AS (
        -- Collapse groups to get the start and end rent dates for each
        SELECT x.tag, x.GroupID
            , DateOnRent  = MIN(IIF(x.on_rent_status = 1, x.[date_on-off_rent], NULL)) -- Using min because we want the FIRST time  on_rent was logged
            , DateOffRent = MIN(IIF(x.on_rent_status = 0, x.[date_on-off_rent], NULL)) -- Using min because we want the FIRST time off_rent was logged
        FROM (
            SELECT x.tag, x.[date_on-off_rent], x.on_rent_status
                -- If the status changed from off_rent to on_rent, then we know a new group has started
                -- Returning 1 to the running total will generate a grouping ID that keeps sequential rent_status records together
                , GroupID = SUM(IIF(x.on_rent_status = 1 AND x.prev_on_rent_status = 0, 1, 0)) OVER (PARTITION BY x.tag ORDER BY x.position)
            FROM (
                SELECT tag, on_rent_status, [date_on-off_rent], position
                    -- Getting the previous value will help us later determine whether the status changed
                    , prev_on_rent_status = LAG(on_rent_status) OVER (PARTITION BY tag ORDER BY position)
                FROM #rent_history
            ) x
        ) x
        GROUP BY x.tag, x.GroupID
    )
    SELECT x.tag
        , FirstDateOnRent = MIN(x.DateOnRent)
        , LastDateOffRent = MAX(x.LastDateOffRent) -- NULL if still on_rent
        , DaysOnRent      = SUM(x.DaysOnRent)
        , DaysOffRent     = DATEDIFF(DAY, MAX(x.LastDateOffRent), GETDATE()) -- NULL if still on_rent
    FROM (
        SELECT x.tag, x.DateOnRent, x.DateOffRent
            -- We want the most recent state of DateOffRent, rather than MAX, which means if they are still on_rent, it should return null
            , LastDateOffRent = LAST_VALUE(x.DateOffRent) OVER (PARTITION BY x.tag ORDER BY x.DateOnRent)
            -- Number of DaysOnrent needs to be calc'd at the group level so we don't accidentally include gaps
            , DaysOnRent  = DATEDIFF(DAY, x.DateOnRent, COALESCE(x.DateOffRent, GETDATE()))
        FROM cte x
    ) x
    GROUP BY x.tag;
    

    I tried to comment the code as best I could, so I won't completely dive into how this works line by line.

    But the basic idea is...calculate some sort of grouping ID that can keep sequential events together. In this case I relied on both LAG() and SUM({state change logic}) OVER() to generate that ID.

    Then from there it's just a lot of extra grouping and aggregation to get the final output.

    This new query version will support multiple on_rent records, multiple off_rent records as well as gaps between off_rent and on_rent.


    Old/Original reply

    Based on your explanation, you have a table/dataset called "Rent History", which represents the top dataset in your screenshot.

    And you want a query against that dataset whose output matches the second dataset in your screenshot labeled "Final Result".

    Assuming that's true, this is what I've come up with...problem is, I'm making a bunch of assumptions around business logic that may not be correct, even if it does match your output table...

    SELECT x.tag, x.DateOnRent, x.DateOffRent
        , DaysOnRent  = DATEDIFF(DAY, x.DateOnRent, COALESCE(x.DateOffRent, GETDATE()))
        , DaysOffRent = DATEDIFF(DAY, x.DateOffRent, GETDATE())
    FROM (
        SELECT tag
            , DateOnRent  = MIN(IIF(on_rent_status = 1, [date_on-off_rent], NULL))
            , DateOffRent = MAX(IIF(on_rent_status = 0, [date_on-off_rent], NULL))
        FROM #rent_history
        GROUP BY tag
    ) x
    

    I also don't know how large the underlying datasets are this is running against, so I have no idea how efficient it would be.

    The concept seems pretty simple though...grab the earliest "on_rent" date and the oldest "off_rent" date. Then do the datediff calculations using those?


    For anyone who wants the script I used to generate the sample data...

    CREATE TABLE #rent_history (
        tag                         nvarchar(20) NOT NULL,
        progress_complete_date_time datetime     NOT NULL,
        create_date                 datetime     NOT NULL,
        required_until_date         datetime     NOT NULL,
        modify_date                 datetime     NOT NULL,
        on_rent_status              bit          NOT NULL,
        [date_on-off_rent]          datetime     NOT NULL,
        position                    int          NOT NULL,
        [first_time_on-off_rent]    datetime         NULL,
        [RANK]                      int          NOT NULL,
    );
    
    INSERT INTO #rent_history
    VALUES ('TAG-000479', '08/12/21 00:00', '11/06/23 19:31', '01/31/24 00:00', '11/06/23 19:31', 1, '08/12/21 00:00', 1, '08/12/21 00:00', 1)
         , ('TAG-000479', '08/12/21 00:00', '11/06/23 19:31', '01/31/24 00:00', '11/06/23 19:31', 1, '11/06/23 00:00', 2, NULL            , 1)
         , ('TAG-000479', '08/12/21 00:00', '11/06/23 19:31', '01/31/24 00:00', '11/06/23 19:31', 1, '11/06/23 00:00', 3, NULL            , 1)
         , ('TAG-000479', '08/12/21 00:00', '11/06/23 19:31', '04/01/24 00:00', '04/04/24 05:00', 0, '04/04/24 00:00', 4, NULL            , 4)
         , ('TAG-000479', '08/12/21 00:00', '11/06/23 19:31', '04/01/24 00:00', '04/04/24 05:00', 1, '04/04/24 00:00', 5, NULL            , 5)
         , ('TAG-000479', '08/12/21 00:00', '11/06/23 19:31', '12/31/24 00:00', '04/30/24 15:11', 1, '04/30/24 00:00', 6, NULL            , 6)
         , ('TAG-000479', '08/12/21 00:00', '11/06/23 19:31', '12/31/24 00:00', '06/10/24 17:02', 0, '06/10/24 00:00', 7, NULL            , 7)
         , ('TAG-001832', '03/28/24 00:00', '01/17/24 15:41', '03/31/24 00:00', '03/28/24 03:32', 1, '03/28/24 00:00', 1, '03/28/24 00:00', 1)
         , ('TAG-003420', '06/24/24 00:00', '06/24/24 18:55', '12/15/24 00:00', '06/25/24 15:18', 1, '06/24/24 00:00', 1, '06/24/24 00:00', 1);