I have a SQlite3 table with the following schema:
CREATE TABLE IF NOT EXISTS automationProcessLog (
id integer PRIMARY KEY,
timestamp text NOT NULL, -- expected format yyyy-MM-dd hh:mm:ss.ssssss+ZZ:ZZ
month_day text GENERATED ALWAYS AS (substr(timestamp, 6, 2) || '-' || substr(timestamp, 9, 2)) STORED, -- expected format MM-dd
hour integer GENERATED ALWAYS AS (substr(timestamp, 12, 2)) STORED,
minute integer GENERATED ALWAYS AS (substr(timestamp, 15, 2)) STORED,
weekday text NOT NULL,
sampleID text,
carrierID text,
nodeID text,
IOnode text, -- expected True/False
nodeType string GENERATED ALWAYS AS (substr(nodeID, 1, instr(nodeID, '/')-1)) STORED,
nodeNumberByType integer GENERATED ALWAYS AS (substr(nodeID, instr(nodeID, '/')+1, length(nodeID))) STORED,
processStep text,
data text,
FOREIGN KEY(nodeID) REFERENCES nodes(nodeID)
)
This table contains all events for samples (sampleID
) transported by carriers (carrierID
) between nodes (nodeID
) on an automation. A carrier can only carry one sample at a time.
I would like to create a view where I have sampleID
, carrierID
, timestamp
when the sample was introduced to the automation, nodeID
where the sample was introduced, timestamp
when the sample was handled (i.e. anything but loaded), nodeID
where the sample was handled, and the duration
from load to handled.
IOnode = "True"
and processStep = "Processed"
duration
is still counted from when the sample was loaded onto the automation.duration
is to be counted from the last loading of the sample.Some pseudo-logic would be to select sampleID
, carrierID
, timestamp
(timestamp_processed
), and nodeID
(nodeID_processed
) WHERE NOT (IOnode = "True" AND processStep = "Processed")
, to this add timestamp
(timestamp_loaded
) and nodeID
(nodeID_loaded
) for the row with the same sampleID
and carrierID
with the latest (greatest) timestamp
that is earlier (less) than timestamp_processed
WHERE (IOnode = "True" AND processStep = "Processed")
, and to this calculate the duration
between the timestamps.
I have been looking at the LAG
function as well as answers to questions like this one, but I have not been able to produce something that works.
You can use a subquery or CTE to extract loading times, using the lead()
window function to also get the next load time for the same sample.
Then join it with the main table to get the corresponding processing steps and durations.
WITH load_times AS (
SELECT sampleID, timestamp as timestamp_loaded, nodeID as nodeID_loaded,
lead(timestamp,1,'2999-12-31') OVER (PARTITION BY sampleID ORDER BY timestamp) as next_load
FROM automationProcessLog
WHERE IOnode = 'True' and processStep = 'Processed'
)
SELECT sampleID, carrierID, timestamp as timestamp_processed, nodeID as nodeID_processed,
timestamp_loaded, nodeID_loaded,
(unixepoch(timestamp_processed) - unixepoch(timestamp_loaded)) as duration
FROM automationProcessLog l
JOIN load_times lt ON (l.sampleID = lt.sampleID AND l.timestamp BETWEEN lt.timestamp_loaded AND lt.next_load)
WHERE NOT (l.IOnode = 'True' and l.processStep = 'Processed')