I have raw data on a database with measurements on sensors a, b, c like this for example:
unique_id;sensor;timestamp
111;A;2024-10-10 08:00:00.000000
222;C;2024-10-10 08:00:00.000000
333;A;2024-10-10 08:00:00.000000
444;A;2024-10-10 08:00:00.000000
555;A;2024-10-10 08:00:00.000000
666;C;2024-10-10 08:00:01.000000
777;C;2024-10-10 08:00:01.000000
888;C;2024-10-10 08:00:01.000000
555;B;2024-10-10 08:00:09.000000
111;A;2024-10-10 08:00:21.000000
444;A;2024-10-10 08:00:24.000000
444;A;2024-10-10 08:00:38.000000
444;B;2024-10-10 08:00:40.000000
666;C;2024-10-10 08:00:42.000000
111;C;2024-10-10 08:00:42.000000
444;B;2024-10-10 08:00:48.000000
111;C;2024-10-10 08:01:02.000000
666;C;2024-10-10 08:01:02.000000
555;C;2024-10-10 08:01:05.000000
444;C;2024-10-10 08:01:09.000000
111;B;2024-10-10 08:01:23.000000
666;C;2024-10-10 08:01:23.000000
222;C;2024-10-10 08:01:23.000000
555;C;2024-10-10 08:01:27.000000
444;C;2024-10-10 08:01:35.000000
555;C;2024-10-10 08:01:38.000000
111;C;2024-10-10 08:01:43.000000
222;C;2024-10-10 08:01:43.000000
555;C;2024-10-10 08:02:00.000000
111;C;2024-10-10 08:02:04.000000
222;C;2024-10-10 08:02:04.000000
666;C;2024-10-10 08:02:04.000000
555;C;2024-10-10 08:02:20.000000
222;C;2024-10-10 08:02:24.000000
111;C;2024-10-10 08:02:24.000000
111;C;2024-10-10 08:02:45.000000
666;C;2024-10-10 08:02:45.000000
666;C;2024-10-10 08:03:05.000000
222;C;2024-10-10 08:03:05.000000
111;C;2024-10-10 08:03:05.000000
111;C;2024-10-10 08:03:26.000000
666;C;2024-10-10 08:03:26.000000
222;C;2024-10-10 08:03:26.000000
I want to query the travel time of each unique_id for the journey from sensor a to sensor c over sensor b, with first appearance at a and last appearance at c. The order must be strict a>b>c. I want a condition to check if this is true. If a unique_id revisits sensor a after beeing at b or revisits b after beeing at c, this unique_id should be discarded from the calculations.
With the help of AI I came up with the following solution:
WITH sensor_sequences AS (
SELECT
sm.unique_id,
MIN(CASE WHEN sm.sensor = 'A' THEN sm.timestamp END) AS first_capture_A,
MIN(CASE WHEN sm.sensor = 'B' THEN sm.timestamp END) AS first_capture_B,
MAX(CASE WHEN sm.sensor = 'C' THEN sm.timestamp END) AS last_capture_C
FROM sensor_measurement sm
WHERE
sm.sensor IN ('A', 'B', 'C')
AND sm.timestamp BETWEEN '2024-10-10 07:00:00' AND '2024-10-10 12:00:00'
GROUP BY sm.unique_id
HAVING
MIN(CASE WHEN sm.sensor = 'A' THEN sm.timestamp END) IS NOT NULL
AND MIN(CASE WHEN sm.sensor = 'B' THEN sm.timestamp END) IS NOT NULL
AND MAX(CASE WHEN sm.sensor = 'C' THEN sm.timestamp END) IS NOT NULL
AND MIN(CASE WHEN sm.sensor = 'B' THEN sm.timestamp END) > MIN(CASE WHEN sm.sensor = 'A' THEN sm.timestamp END)
AND MAX(CASE WHEN sm.sensor = 'C' THEN sm.timestamp END) > MIN(CASE WHEN sm.sensor = 'B' THEN sm.timestamp END)
AND SECONDS_BETWEEN(MAX(CASE WHEN sm.sensor = 'C' THEN sm.timestamp END),MIN(CASE WHEN sm.sensor = 'A' THEN sm.timestamp END)) <= 300 --max time between two timestamps
)
SELECT
ss.unique_id,
ss.first_capture_A AS sensor_A_time,
ss.first_capture_B AS sensor_B_time,
ss.last_capture_C AS sensor_C_time,
SECONDS_BETWEEN(ss.last_capture_C, ss.first_capture_A) AS travel_time_seconds
FROM sensor_sequences ss
ORDER BY ss.first_capture_A;
Problem is, it still shows me travel times where the unique_id went from sensor b back to sensor a and I can not solve this. How can I adjust the query to only get the strict order a>b>c? The unique_id 111 should be discarded, but it is used.
Use LISTAGG
to get the chain of sensors (e.g. 'AABBCCC' for a valid chain or 'ABAC' for an invalid one). Then check with REGEXP_LIKE
that the generated string consists of As followed by Bs then followed by Cs.
select
unique_id
max(timestamp) - min(timestamp) as duration,
from sensor_measurement
where sensor IN ('A', 'B', 'C')
group by unique_id
having regexp_like(listagg(sensor) within group (order by timestamp), '^A+B+C+$')
order by unique_id;