I have a table with a snapshot of repeated data for a key and value pair with record_timestamps. The dataset can also contain late-arriving historical record_timestamps as well. I want to generate an output with an EffectiveFrom and EffectiveTo timestamp for each combination of key and value, where the validity closes when the value changes and a new record is created for a new combination. Note: The same values can reappear after an interval with a different record_timestamp.
Snapshot of repeated data
create table equipments_staging
(
equipment string,
item string,
value string,
timestamp timestamp
)
insert into equipments_staging
values
('M','X','1','2023-11-10 13:00')
,('M','X','2','2023-11-11 13:00')
,('M','X','2','2023-11-12 13:00')
,('M','X','1','2023-11-13 13:00')
,('M','X','1','2023-11-14 13:00')
,('M','X','1','2023-11-15 13:00')
,('M','X','2','2023-11-16 13:00')
,('M','X','2','2023-11-17 13:00')
,('M','X','1','2023-11-18 13:00')
Expected Outcome
I have tried the use lead and lag partitioned by the keys and ordered by the timestamp for each record to find the prev and next value and use this to find a way to group sets. However, this does not work as I am missing something since the same value can reappear after an interval, which should not be grouped together.
Select equipment,item,value,
LAG(value,1) OVER (PARTITION BY equipment,item ORDER BY timestamp) AS prev,
LEAD(value,1) OVER (PARTITION BY equipment,item ORDER BY timestamp) AS next,
coalesce(LEAD(timestamp,1),'9999-12-31') OVER (PARTITION BY equipment,item ORDER BY timestamp) AS next_timestamp,
timestamp
from repeated_data
Try this:
with add_grp_id as (
SELECT *
, rank() OVER (PARTITION BY equipment, item ORDER BY timestamp asc)
- rank() OVER (PARTITION BY equipment, item, value ORDER BY timestamp) AS GroupId
FROM equipments_staging
)
select equipment, item, value
,min(timestamp) as effective_from
,max(timestamp) as effective_to
from add_grp_id
group by equipment, item, value, GroupId
order by equipment, item, effective_from
;