I have the dataset looks like below.
ts c1 c2 c3
2019-01-04T01:50:00.000Z C 25.48801612854004 33.317527770996094
2019-01-04T01:51:00.000Z C 25.74610710144043 33.392295837402344
2019-01-04T01:52:00.000Z C 25.978872299194336 33.29177474975586
2019-01-04T01:53:00.000Z B 26.12158203125 33.2805061340332
2019-01-04T01:54:00.000Z B 26.28511619567871 33.26923751831055
2019-01-04T01:55:00.000Z C 26.470335006713867 33.25796890258789
2019-01-04T01:56:00.000Z C 26.63957977294922 33.24669647216797
2019-01-04T01:57:00.000Z C 26.954004287719727 33.23542785644531
2019-01-04T01:58:00.000Z C 27.08258056640625 33.224159240722656
2019-01-04T01:59:00.000Z A 27.25551986694336 33.212890625
2019-01-04T02:00:00.000Z A 27.514263153076172 33.201622009277344
2019-01-04T02:01:00.000Z A 27.588970184326172 33.17148971557617
2019-01-04T02:02:00.000Z B 27.727638244628906 33.13819122314453
2019-01-04T02:03:00.000Z B 27.956039428710938 33.104896545410156
2019-01-04T02:04:00.000Z B 28.152463912963867 33.10499954223633
I want to take the first and last value of "ts" for every partition values from column "c1". I have tried the below query but it doesn't return the correct results.
SELECT ts, c1, c2, c3,
first_value(ts) OVER (partition by c1 order by ts
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as first,
last_value(ts) OVER (partition by c1 order by ts
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as last
FROM `default`.`a07_a15`
Issue: First value returns only three distinct ts value and max value returns completely wrong.
Expected: I need the first and last value for every repeated partition values.
ts c1 c2 c3 first last
2019-01-04T01:50:00.000Z C 25.48801612854004 33.317527770996094 2019-01-04T01:50:00.000Z 2019-01-04T01:52:00.000Z
2019-01-04T01:51:00.000Z C 25.74610710144043 33.392295837402344 2019-01-04T01:50:00.000Z 2019-01-04T01:52:00.000Z
2019-01-04T01:52:00.000Z C 25.978872299194336 33.29177474975586 2019-01-04T01:50:00.000Z 2019-01-04T01:52:00.000Z
2019-01-04T01:53:00.000Z B 26.12158203125 33.2805061340332 2019-01-04T01:53:00.000Z 2019-01-04T01:54:00.000Z
2019-01-04T01:54:00.000Z B 26.28511619567871 33.26923751831055 2019-01-04T01:53:00.000Z 2019-01-04T01:54:00.000Z
2019-01-04T01:55:00.000Z C 26.470335006713867 33.25796890258789 2019-01-04T01:55:00.000Z 2019-01-04T01:58:00.000Z
2019-01-04T01:56:00.000Z C 26.63957977294922 33.24669647216797 2019-01-04T01:55:00.000Z 2019-01-04T01:58:00.000Z
2019-01-04T01:57:00.000Z C 26.954004287719727 33.23542785644531 2019-01-04T01:55:00.000Z 2019-01-04T01:58:00.000Z
2019-01-04T01:58:00.000Z C 27.08258056640625 33.224159240722656 2019-01-04T01:55:00.000Z 2019-01-04T01:58:00.000Z
2019-01-04T01:59:00.000Z A 27.25551986694336 33.212890625 2019-01-04T01:59:00.000Z 2019-01-04T02:01:00.000Z
2019-01-04T02:00:00.000Z A 27.514263153076172 33.201622009277344 2019-01-04T01:59:00.000Z 2019-01-04T02:01:00.000Z
2019-01-04T02:01:00.000Z A 27.588970184326172 33.17148971557617 2019-01-04T01:59:00.000Z 2019-01-04T02:01:00.000Z
2019-01-04T02:02:00.000Z B 27.727638244628906 33.13819122314453 2019-01-04T02:02:00.000Z 2019-01-04T02:04:00.000Z
2019-01-04T02:03:00.000Z B 27.956039428710938 33.104896545410156 2019-01-04T02:02:00.000Z 2019-01-04T02:04:00.000Z
2019-01-04T02:04:00.000Z B 28.152463912963867 33.10499954223633 2019-01-04T02:02:00.000Z 2019-01-04T02:04:00.000Z
Use lag()
and lead()
:
select t.*
from (select t.*,
lag(c1) over (order by ts) as prev_c1,
lead(c1) over (order by ts) as next_c1
from t
) t
where prev_c1 is null or next_c1 is null or
prev_c1 <> c1 or next_c1 <> c1;
This puts the values in different rows. If you want them in the same rows, probably treating this as a gaps-and-islands problem is the simplest solution:
select c1, min(ts), max(ts)
from (select t.*,
row_number() over (order by ts) as seqnum,
row_number() over (partition by c1 order by ts) as seqnum_2
from t
) t
group by c1, (seqnum - seqnum_2);
EDIT:
If you need to keep the original rows, just use window functions, make sure aliases match:
select t.*,
min(ts) over (partition by c1, (seqnum - seqnum2)) as min_ts,
max(ts) over (partition by c1, (seqnum - seqnum2)) as max_ts
from (select t.*,
row_number() over (order by ts) as seqnum,
row_number() over (partition by c1 order by ts) as seqnum2
from t
) t