I have a table with logs of users who used certain service. Something like the table below, each row is timestamp of activity and user id.
user_id | timestamp |
---|---|
831 | 2022-06-22 04:37:10 |
789 | 2022-06-22 12:38:57 |
831 | 2022-06-22 16:40:10 |
I want to calculate number of unique users in each day, but not just in that day, but include a week prior. Basically, moving window unique count: for day "x" count should be in window of "x-7 days":"x".
As I see in docs,
INTERVAL syntax for DateTime RANGE OFFSET frame: not supported, specify the number of seconds instead (RANGE works with any numeric type).
easy way of using interval with passing something like RANGE INTERVAL 7 day PRECEDING
is not supported, and they suggest to use range with passing seconds, but I don't really have experience with range in sql, so I don't really get how do you pass seconds there. My current code:
with cleaned_table as (
select
user_id,
date_trunc('day', timestamp) as day
from
table
)
SELECT
day,
uniqExact(user_id) OVER (
PARTITION by day ORDER BY day range ???
)
FROM
cleaned_table
Also, ideally, I have a feeling that I should add group by
somewhere since I need only one row per each day, not a row for each row in initial table, and without grouping I'm doing recalculation(?) for each row instead of calculating for each day once.
create table t(user_id Int64, timestamp DateTime) Engine = Memory as select * from values((831, '2022-06-22 04:37:10'), (789,'2022-06-22 12:38:57'), (831,'2022-06-22 16:40:10'), (1,'2022-06-21 12:38:57'), (2,'2022-06-20 16:40:10'));
SELECT
day,
finalizeAggregation(u) AS uniqByDay,
uniqMerge(u) OVER (ORDER BY day ASC RANGE BETWEEN 6 PRECEDING AND CURRENT ROW) AS uniqBy6Days
FROM
(
SELECT
toDate(timestamp) AS day,
uniqState(user_id) AS u
FROM t
GROUP BY day
)
ORDER BY day ASC
┌────────day─┬─uniqByDay─┬─uniqBy6Days─┐
│ 2022-06-20 │ 1 │ 1 │
│ 2022-06-21 │ 1 │ 2 │
│ 2022-06-22 │ 2 │ 4 │
└────────────┴───────────┴─────────────┘