Search code examples
sqlaggregatedistinctvertica

how to count aggregate distinct IDs Per Hour Since Beginning of the day?


I need to write a Query, which count distinct IDs and aggregate them over time.

for example

enter image description here

and the result should be

enter image description here


Solution

  • For each id, record the first time the id shows up and then do a cumulative sum:

    select hour, sum(count(*)) over (partition by day order by hour)
    from (select day, id, min(hour) as hour
          from t
          group by day, id
         ) t
    group by hour
    order by hour;
    

    Note: This assumes that you really want hour within a given day.

    You can also express this as:

    select day, hour, sum(cnt) over (partition by day order by hour)
    from (select day, hour, count(*) as cnt
          from (select day, id, min(hour) as hour
                from t
                group by day, id
               ) t
          group by hour
         ) h
    order by hour;
    

    The above will not include an hour unless there is an new id in that hour. For all hours, you can use window functions instead:

    select hour,
           sum(sum( (seqnum = 1)::int ) over (partition by day order by hour)
    from (select day, id,
                 row_number() over (partition by day, id order by hour) as seqnum
          from t
         ) t
    group by hour
    order by hour;
    

    You can also express this as:

    select day, hour, sum(cnt) over (partition by day order by hour)
    from (select day, hour, sum( (seqnum = 1)::int ) as cnt
          from (select day, id,
                       row_number() over (partition by day, id order by hour) as seqnum
                from t
               ) t
          group by hour, day
         ) dh
    order by hour;