my data is something like this, timestamp is ordered in asc.
INSERT INTO timeseries (timestamp, value)
VALUES
('2022-01-01 00:00:00', 0.89),
**('2022-01-01 10:01:00', 6.89),
('2022-01-02 10:01:21', 10.99),
('2022-01-02 10:07:00', 11.89),**
('2022-01-02 12:01:00', 0.89),
**('2022-01-02 13:07:00', 6.39),**
('2022-01-02 14:00:00', 0.69),
**('2022-01-03 14:02:00', 5.39),
('2022-01-03 15:04:00', 6.89),
('2022-01-03 15:00:00', 7.3),**
('2022-01-03 15:10:00', 1.89),
('2022-01-03 15:50:00', 0.8);
my ask is to get min and max, so i can produce difference in minutes whenever value is going above 5. in above, there are three data sets formed
min max
2022-01-01 10:01:00 2022-01-02 10:07:00
2022-01-02 13:07:00 2022-01-02 13:07:00
2022-01-03 14:02:00 2022-01-03 15:00:00
I tried row_number to get sequence but since there are three sets there, i need to attribute column so i can use in partition clause but i am not getting any ideas for that?
WITH CTE AS (
SELECT CASE WHEN VALUE>5 THEN 'ON' ELSE 'OFF' END STATUS , TIMESTAMP, VALUE
FROM TIMESERIES)
SELECT ROW_NUMBER() OVER(PARTITION BY STATUS ORDER BY TIMESTAMP) RN,TIMESTAMP,VALUE FROM CTE
ORDER BY TIMESTAMP;
this is giving me row_number for all >5 but I need based on sets of data..
any help to start or a solution is highly appreciated.
BDW, trying this in snowflake but general SQL is also fine.
Standard gaps and islands:
with flagged as (
select *,
case when
lag(val, 1, 0.0) over (order by ts) <= 5
and val > 5 then 1 end as flag
from T
), grouped as (
select *,
count(flag) over (order by ts) as grp
from flagged
)
select
min(ts),
max(case when val > 5 then ts end)
from grouped
group by grp
having max(val) > 5
order by grp;