I need to write a Query, which count distinct IDs and aggregate them over time.
for example
and the result should be
For each id
, record the first time the id shows up and then do a cumulative sum:
select hour, sum(count(*)) over (partition by day order by hour)
from (select day, id, min(hour) as hour
from t
group by day, id
) t
group by hour
order by hour;
Note: This assumes that you really want hour within a given day.
You can also express this as:
select day, hour, sum(cnt) over (partition by day order by hour)
from (select day, hour, count(*) as cnt
from (select day, id, min(hour) as hour
from t
group by day, id
) t
group by hour
) h
order by hour;
The above will not include an hour unless there is an new id in that hour. For all hours, you can use window functions instead:
select hour,
sum(sum( (seqnum = 1)::int ) over (partition by day order by hour)
from (select day, id,
row_number() over (partition by day, id order by hour) as seqnum
from t
) t
group by hour
order by hour;
You can also express this as:
select day, hour, sum(cnt) over (partition by day order by hour)
from (select day, hour, sum( (seqnum = 1)::int ) as cnt
from (select day, id,
row_number() over (partition by day, id order by hour) as seqnum
from t
) t
group by hour, day
) dh
order by hour;