I currently have a table of events with a time stamp and a "state" description of that event. The distribution of these events are rather random--I may have a hundred events in a day, followed by multiple days of inactivity. These events are thrown by individual machines, which can be identified by a machine ID.
The view exposing these events is defined thus:
create view V_Machine_State_Window as
select LOCAL_DATETIME,
MACHINE,
STATE,
datediff(second, LOCAL_DATETIME, lead(LOCAL_DATETIME,1) over(order by MACHINE, LOCAL_DATETIME)) as DURATION,
isnull(left(STATE_TEXT, 60), '') as STATE_TEXT,
cast(LOCAL_DATETIME as date) as STATE_DATE,
cast(LOCAL_DATETIME as time) as STATE_TIME,
lag(STATE,1) over(order by MACHINE, LOCAL_DATETIME) as PREV_STATE,
isnull(left(lag(STATE_TEXT,1) over(order by MACHINE, LOCAL_DATETIME), 60), '') as PREV_STATE_TEXT,
lag(LOCAL_DATETIME,1) over(order by MACHINE, LOCAL_DATETIME) as PREV_STATE_DATESTAMP,
lead(STATE,1) over(order by MACHINE, LOCAL_DATETIME) as NEXT_STATE,
isnull(left(lead(STATE_TEXT,1) over(order by MACHINE, LOCAL_DATETIME), 60), '') as NEXT_STATE_TEXT,
lead(LOCAL_DATETIME,1) over(order by MACHINE, LOCAL_DATETIME) as NEXT_STATE_DATESTAMP
from MACHINE_STATE_TABLE
where LOCAL_DATETIME > DATEADD(day, -400, getdate())
For purposes of reporting, it's important that we have the duration of the state, the previous state it was in if there was one, and the next state it changed to if there is one.
As a requirement for analysis of machine behavior, I need to segregate this data by individual days, including adding a record for the beginning each day with the "state" of the most recent event thrown by that machine. With the table Day being just a list of days so I can cross apply, I'm using the following to add a daily record to every machine:
with m_in_scope
as (
select distinct MACHINE
from V_Machine_State_Window)
select d.Day as LOCAL_DATETIME,
m_in_scope.MACHINE as MACHINE,
NULL as STATE,
NULL as DURATION,
NULL as STATE_TEXT,
d.Day as STATE_DATE,
cast('00:00' as time) as STATE_TIME,
NULL as PREV_STATE,
NULL as PREV_STATE_TEXT,
NULL as PREV_STATE_DATESTAMP,
NULL as NEXT_STATE,
NULL as NEXT_STATE_TEXT,
NULL as NEXT_STATE_DATESTAMP
from Day d
cross join m_in_scope
where d.Day > DATEADD(day, -400, getdate())
and d.Day < getdate()
The thing I'm struggling to do is figure out how to fill in those nulls. The state needs to be the most recent state before the start of the day, and the duration needs to be the number of seconds between the start of the day (00:00) and the datestamp of the next event or the end of the day, whichever is sooner. I'm comfortable doing all the maths here as long as I can get the previous and next records in my statement.
I've tried this as a CTE, but I can't find a rational way to partition.
I could just insert the nulls and run an update using a window expression, but as I specified I might have many of these daily records in a row, so the lag and lead of a window expression would only correctly update the next record after each "real" event, and not the ones after it. Doing this in a cursor to loop over it until there are no nulls seems like a really bad idea for processing time, as I'm looking at a table of approximately 50 million records split over a few thousand machines.
Is there a rational way to craft either a CTE or window expression (or some other alternative I haven't yet thought of) that will give me the previous and next record with a populated state from what I've described above?
NOTE: The server doing this processing is on SQL 2014. Please let me know if the right solution for this problem is version-limited.
Taking what you put in the fiddle, and turning it into DDL/DML:
DECLARE @V_Machine_State_Window TABLE (Local_Datetime DATETIME, Machine NVARCHAR(128), State BIT, Duration INT, State_Text NVARCHAR(20), State_Date DATE, State_time TIME, Prev_State BIT, Prev_State_Text NVARCHAR(20),
Prev_State_Datestamp DATETIME, Next_State BIT, Next_State_Text NVARCHAR(20), Next_State_Datestamp DATETIME);
INSERT INTO @V_Machine_State_Window (Local_Datetime, Machine, State, Duration, State_Text, State_Date, State_Time, Prev_State, Prev_State_Text, Prev_State_Datestamp, Next_State, Next_State_Text, Next_State_Datestamp) VALUES
('2023-06-01 00:00:00.000', 'PC1', 0, 300 , 'Idle', '2023-06-01', '00:00:00.0000000', null, null, null, 1, 'Active', '2023-06-01 00:05:00.000'),
('2023-06-01 00:05:00.000', 'PC1', 1, 600 , 'Active', '2023-06-01', '00:05:00.0000000', 0, 'Idle', '2023-06-01 00:00:00.000', 0, 'Idle', '2023-06-01 00:15:00.000'),
('2023-06-01 00:15:00.000', 'PC1', 0, 373500, 'Idle', '2023-06-01', '00:15:00.0000000', 1, 'Active', '2023-06-01 00:05:00.000', 1, 'Active', '2023-06-05 08:00:00.000'),
('2023-06-05 08:00:00.000', 'PC1', 1, 600 , 'Active', '2023-06-05', '08:00:00.0000000', 0, 'Idle', '2023-06-01 00:15:00.000', 0, 'Idle', '2023-06-05 08:10:00.000'),
('2023-06-01 21:32:00.000', 'PC2', 1, 600, 'Spin', '2023-06-04', '21:32:00.00000', NULL, NULL, NULL, NULL, NULL, NULL),
('2023-06-04 21:32:00.000', 'PC2', 1, 600, 'Spin', '2023-06-04', '21:32:00.00000', NULL, NULL, NULL, NULL, NULL, NULL);
(I added a couple of rows, for a second machine name).
The Prev/Next states are superfluous, as we'll need to calculate them again after the addition of your daily start rows, as we will with the duration column, which seems to be based on the difference in seconds between this rows Local_Datetime
and the next one.
You mention you have a Days table, This example generates that data using a rCTE
to fill in the series between the min and max dates in the table. This is then CROSS APPLY
'd to the unique machine names, so we end up with a day, per machine.
Finally, a result set based on the MachineDays
is created by using an OUTER APPLY
on the table itself to find the most recent actual row to pair with midnight on each day. This is UNION
'd with the generated rows, and we can then perform the LAG
and LEAD
functions on it to fill in the previous, next and duration values.
;WITH Days AS (
SELECT MIN(State_Date) AS Datetime, MAX(State_Date) AS mxDatetime
FROM @V_Machine_State_Window
UNION ALL
SELECT DATEADD(DAY,1,Datetime), mxDatetime
FROM Days
WHERE Datetime < mxDatetime
), Machines AS (
SELECT Machine
FROM @V_Machine_State_Window
GROUP BY Machine
), MachineDays AS (
SELECT CAST(Datetime AS DATETIME) AS Datetime, Machine
FROM Days
CROSS APPLY Machines
)
SELECT Local_Datetime, Machine, State,
DATEDIFF(SECOND,Local_Datetime,LEAD(Local_DateTime, 1) OVER (PARTITION BY Machine ORDER BY Local_Datetime)) AS Duration,
State_Text, State_Date, State_Time,
LAG(State,1) OVER (PARTITION BY Machine ORDER BY Local_Datetime) AS Prev_State, LAG(State_Text,1) OVER (PARTITION BY Machine ORDER BY Local_Datetime) AS Prev_State_Text,
LAG(Local_Datetime,1) OVER (PARTITION BY Machine ORDER BY Local_Datetime) AS Prev_State_Datestamp,
LEAD(State,1) OVER (PARTITION BY Machine ORDER BY Local_Datetime) AS Next_State, LEAD(State_Text,1) OVER (PARTITION BY Machine ORDER BY Local_Datetime) AS Next_State_Text,
LEAD(Local_Datetime,1) OVER (PARTITION BY Machine ORDER BY Local_Datetime) AS Next_State_Datestamp
FROM (
SELECT md.Datetime AS Local_Datetime, a.Machine, a.State, a.Duration, a.State_Text, CAST(md.Datetime AS DATE) AS State_Date, '00:00:00' AS State_Time
FROM MachineDays md
OUTER APPLY (SELECT TOP 1 * FROM @V_Machine_State_Window WHERE Machine = md.Machine AND Local_Datetime < md.Datetime ORDER BY Local_Datetime DESC) a
WHERE a.Machine IS NOT NULL
UNION
SELECT a.Local_Datetime, a.Machine, a.State, a.Duration, a.State_Text, a.State_Date, a.State_Time
FROM @V_Machine_State_Window a
) a
Local_Datetime | Machine | State | Duration | State_Text | State_Date | State_Time | Prev_State | Prev_State_Text | Prev_State_Datestamp | Next_State | Next_State_Text | Next_State_Datestamp |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2023-06-01 00:00:00.000 | PC1 | 0 | 300 | Idle | 2023-06-01 | 00:00:00.0000000 | 1 | Active | 2023-06-01 00:05:00.000 | |||
2023-06-01 00:05:00.000 | PC1 | 1 | 600 | Active | 2023-06-01 | 00:05:00.0000000 | 0 | Idle | 2023-06-01 00:00:00.000 | 0 | Idle | 2023-06-01 00:15:00.000 |
2023-06-01 00:15:00.000 | PC1 | 0 | 85500 | Idle | 2023-06-01 | 00:15:00.0000000 | 1 | Active | 2023-06-01 00:05:00.000 | 0 | Idle | 2023-06-02 00:00:00.000 |
2023-06-02 00:00:00.000 | PC1 | 0 | 86400 | Idle | 2023-06-02 | 00:00:00.0000000 | 0 | Idle | 2023-06-01 00:15:00.000 | 0 | Idle | 2023-06-03 00:00:00.000 |
2023-06-03 00:00:00.000 | PC1 | 0 | 86400 | Idle | 2023-06-03 | 00:00:00.0000000 | 0 | Idle | 2023-06-02 00:00:00.000 | 0 | Idle | 2023-06-04 00:00:00.000 |
2023-06-04 00:00:00.000 | PC1 | 0 | 86400 | Idle | 2023-06-04 | 00:00:00.0000000 | 0 | Idle | 2023-06-03 00:00:00.000 | 0 | Idle | 2023-06-05 00:00:00.000 |
2023-06-05 00:00:00.000 | PC1 | 0 | 28800 | Idle | 2023-06-05 | 00:00:00.0000000 | 0 | Idle | 2023-06-04 00:00:00.000 | 1 | Active | 2023-06-05 08:00:00.000 |
2023-06-05 08:00:00.000 | PC1 | 1 | Active | 2023-06-05 | 08:00:00.0000000 | 0 | Idle | 2023-06-05 00:00:00.000 | ||||
2023-06-01 21:32:00.000 | PC2 | 1 | 8880 | Spin | 2023-06-04 | 21:32:00.0000000 | 1 | Spin | 2023-06-02 00:00:00.000 | |||
2023-06-02 00:00:00.000 | PC2 | 1 | 86400 | Spin | 2023-06-02 | 00:00:00.0000000 | 1 | Spin | 2023-06-01 21:32:00.000 | 1 | Spin | 2023-06-03 00:00:00.000 |
2023-06-03 00:00:00.000 | PC2 | 1 | 86400 | Spin | 2023-06-03 | 00:00:00.0000000 | 1 | Spin | 2023-06-02 00:00:00.000 | 1 | Spin | 2023-06-04 00:00:00.000 |
2023-06-04 00:00:00.000 | PC2 | 1 | 77520 | Spin | 2023-06-04 | 00:00:00.0000000 | 1 | Spin | 2023-06-03 00:00:00.000 | 1 | Spin | 2023-06-04 21:32:00.000 |
2023-06-04 21:32:00.000 | PC2 | 1 | 8880 | Spin | 2023-06-04 | 21:32:00.0000000 | 1 | Spin | 2023-06-04 00:00:00.000 | 1 | Spin | 2023-06-05 00:00:00.000 |
2023-06-05 00:00:00.000 | PC2 | 1 | Spin | 2023-06-05 | 00:00:00.0000000 | 1 | Spin | 2023-06-04 21:32:00.000 |