Lookup previous values on condition in SQL Server 2017 using a table as a queue

When looking up a row on a table called reason:

machine_id  reason   start_time
001234      moving   10:00:00
001234      parked   10:10:00
001234      moving   10:15:00
001234      NULL     10:20:00
001234      NULL     10:25:00
001234      dumping  10:30:00

009876      parked   10:00:00
009876      NULL     10:10:00
009876      NULL     10:15:00
009876      moving   10:20:00
009876      dumping  10:25:00

For any reason's with a NULL value I need to get the most recent value that wasn't NULL, so 001234 NULL values would both become 'moving', and 009876 NULL values would both become 'parked'.

I would normally solve this with a cross apply like:

SELECT 
    r1.machine_id    
    ,ISNULL(r1.reason,r2.reason) AS reason
    ,r1.start_time
FROM #reason r1
CROSS APPLY(
    SELECT TOP 1
            r2.reason
    FROM #reason r2
    WHERE r2.machine_oid = r1.machine_oid 
    AND r2.start_time < r1.start_time
    AND r2.reason IS NOT NULL
    ORDER BY start_time DESC
) r2

But this table I'm querying is hundreds of thousands of rows (can't modify the source database) and the complexity of the query seems close to n^2.

In C++ I would solve by using a priority queue and discarding items from the list of objects to check that didn't meet the criteria so the complexity is closer to nlogn.

I tried understanding the post about using tables as queues here: http://rusanu.com/2010/03/26/using-tables-as-queues but it was beyond my skill level.

Because it's a very common requirement for my datasets, I was hoping there is an elegant solution that can be applied?

Solution

Something like this:

DECLARE @DataSource TABLE
(
    [machine_id] VARCHAR(6)
   ,[reason] VARCHAR(12)
   ,[start_time] TIME
);

INSERT INTO @DataSource([machine_id], [reason], [start_time])
VALUES ('001234', 'moving', '10:00:00')
      ,('001234', 'parked', '10:10:00')
      ,('001234', 'moving', '10:15:00')
      ,('001234', NULL, '10:20:00')
      ,('001234', NULL, '10:25:00')
      ,('001234', 'dumping', '10:30:00')
      ,('009876', 'parked', '10:00:00')
      ,('009876', NULL, '10:10:00')
      ,('009876', NULL, '10:15:00')
      ,('009876', 'moving', '10:20:00')
      ,('009876', 'dumping',  '10:25:00');


SELECT [machine_id]
      ,[reason] AS [reason_old]
      ,ISNULL([reason], MAX([Reason]) OVER (PARTITION BY [machine_id], [RowID])) AS [reason]
      ,[start_time]
FROM 
(
    SELECT *
          ,SUM(IIF([reason] IS NULL, 0, 1)) OVER (PARTITION BY [machine_id] ORDER BY [start_time] ASC) AS [RowID]
    FROM @DataSource 
) DS
ORDER BY [machine_id]
        ,[start_time];

The idea is to use SUM to group the records with NULL value with the the first record with NOT NULL value.

SELECT *
      ,SUM(IIF([reason] IS NULL, 0, 1)) OVER (PARTITION BY [machine_id] ORDER BY [start_time] ASC) AS [RowID]
FROM @DataSource;

Then, we can simply get the MAX/MIN value for such group as these aggregates ignore NULLs and will return the NOT NULL value.