Search code examples
sqlsql-servermissing-datastatus

SQL query to find max value for each day available, and fill in rows for missing days to build a complete table?


I have a table from IBM Maximo called wostatus, which records changes in workorder statuses. There can be mutliple status changes in a single day, and changes are only recorded when they happen.

I need to be able to return a table that shows fills in the missing days in the table with the last known status, going back 2 years.

An example table looks like:

wonum changedate wostatus wostatusid
WO00001 21/3/2023 10:56 APPR 100012
WO00001 21/3/2023 10:58 WAPPR 100014
WO00002 22/3/2023 11:03 APPR 100007
WO00002 24/3/2023 10:56 COMP 100009
WO00003 21/3/2023 10:59 WAPPR 100016
WO00003 21/3/2023 11:01 APPR 100017
WO00003 24/3/2023 10:56 WSCH 100018

I was trying to get a table that would show:

date wonum wostatus
21/03/2023 WO00001 WAPPR
21/03/2023 WO00003 APPR
22/03/2023 WO00001 WAPPR
22/03/2023 WO00003 APPR
22/03/2023 WO00002 APPR
23/03/2023 WO00001 WAPPR
23/03/2023 WO00003 APPR
23/03/2023 WO00002 APPR
24/03/2023 WO00001 WAPPR
24/03/2023 WO00003 WSCH
24/03/2023 WO00002 COMP

I have tried (without trying to fill in the missing rows) to get a work order to show me the maximum status on each day using max(changedate), and max(wostatusid). But I know enough to know I'm out of my depth here. Any help would be very much appreciated.

We use SQL Server.


Solution

  • This kind of problem is usually called gaps and islands, and there are quite a lot of examples here in stackoverflow, but i guess one more won't hurt:

    ;WITH CTE AS (
    SELECT  TOP 1 WITH ties wonum, CONVERT(DATE, changedate , 103) AS date
    ,   changedate
    , wostatus
    FROM    (
        VALUES  (N'WO00001', N'21/3/2023 10:56', N'APPR', 100012)
        ,   (N'WO00001', N'21/3/2023 10:58', N'WAPPR', 100014)
        ,   (N'WO00002', N'22/3/2023 11:03', N'APPR', 100007)
        ,   (N'WO00002', N'24/3/2023 10:56', N'COMP', 100009)
        ,   (N'WO00003', N'21/3/2023 10:59', N'WAPPR', 100016)
        ,   (N'WO00003', N'21/3/2023 11:01', N'APPR', 100017)
        ,   (N'WO00003', N'24/3/2023 10:56', N'WSCH', 100018)
    ) t (wonum,changedate,wostatus,wostatusid)
    ORDER BY row_number() OVER(partition BY wonum, CONVERT(date, changedate , 103)  ORDER BY CONVERT(datetime, changedate , 103) DESC)
    )
    , CTE2 AS (
        SELECT  *
        FROM    (
            VALUES(1),(2),(3),(4), (5), (6), (7),(8), (9), (10)
        ) v(c)
    )
    SELECT  a.*
    FROM    (
        SELECT  *
        ,   DATEDIFF(DAY, LAG(date) OVER(PARTITION BY wonum ORDER BY date), date) AS prevDateDiff
        ,   LAG(wostatus) OVER(PARTITION BY wonum ORDER BY date) AS prevStatus
        ,   LEAD(wostatus) OVER(PARTITION BY wonum ORDER BY date) AS nextStatus
        FROM    cte
        ) dates
    CROSS APPLY (
        SELECT  dates.wonum, dates.wostatus, dates.date, 0 AS calculatedFlag
        UNION ALL
        SELECT  dates.wonum, dates.prevStatus, DATEADD(DAY, -counter, dates.date), 1 AS calculatedFlag
        FROM    (
            select ROW_NUMBER() OVER( ORDER BY @@SPID) AS counter
            from CTE2 c
            cross apply CTE2 c2
            cross apply CTE2 c3
            ) c
        WHERE   c.counter < prevdatediff
        UNION ALL
        SELECT  dates.wonum, dates.wostatus, DATEADD(DAY, c.counter, dates.date), 2 AS calculatedFlag
        FROM    (
            select ROW_NUMBER() OVER( ORDER BY @@SPID) AS counter
            from CTE2 c
            cross apply CTE2 c2
            cross apply CTE2 c3
            ) c
        WHERE   c.counter < DATEDIFF(day, dates.date, GETDATE())
        AND nextStatus IS NULL -- Last workorder
        ) a
    

    Here's one solution to the issue, it's a bit complex but i'll try to explain:

    • First i want to get only the last date per worknumber and day. I use Top 1 with Ties and order by row_number to get only the last date
    • Secondly, since we need to fill out dates, we want to have some sort of series. Usually your database might have a calender table that contains all dates, but maybe you don't, so i'm using a little apply thing that creates it.
    • Third, we have 3 cases
      • The actual worknum date
      • Fill dates backwards from worknum dates
      • Fill dates forward to today's date
    • For each of the above cases, i create a separate UNION ALL inside the CROSS APPLY.
      • First case is easiest, just take the data we have
      • Second case, we want to use our series tables to generate a list of rows for every date that is different from previous dates.
      • Third case is similar to second, but instead we fill up forward. Important to only fill up for the last workorder date, for that i use LEAD(wostatus) to make sure there are no more workorder dates in the future.

    So yeah, it looks a bit involved, but maybe it's possible to understand, feel free to add questions and i'll try to explain a bit more in detail

    EDIT:

    To just use your table, change this part:

    FROM    yourworktable t
    /*(
        VALUES  (N'WO00001', N'21/3/2023 10:56', N'APPR', 100012)
        ,   (N'WO00001', N'21/3/2023 10:58', N'WAPPR', 100014)
        ,   (N'WO00002', N'22/3/2023 11:03', N'APPR', 100007)
        ,   (N'WO00002', N'24/3/2023 10:56', N'COMP', 100009)
        ,   (N'WO00003', N'21/3/2023 10:59', N'WAPPR', 100016)
        ,   (N'WO00003', N'21/3/2023 11:01', N'APPR', 100017)
        ,   (N'WO00003', N'24/3/2023 10:56', N'WSCH', 100018)
    ) t (wonum,changedate,wostatus,wostatusid)
    */