Search code examples
sqlamazon-redshiftwindow-functionscumulative-frequency

Cumulative count for calculating daily frequency using SQL query (in Amazon Redshift)


I have a dataset contains 'UI' (unique id), time, frequency (frequency for give value in UI column), as it is shown here:

enter image description here

What I would like to add a new column named 'daily_frequency' which simply counts each unique value in UI column for a given day sequentially as I show in the image below.

enter image description here

For example if UI=114737 and it is repeated 2 times in one day, we should have 1, and 2 in the daily_frequency column.

I could do that with Python and Panda package using group by and cumcount methods as follows ...

df['daily_frequency'] = df.groupby(['UI','day']).cumcount()+1

However, for some reason, I must do this via SQL queries (Amazon Redshift).


Solution

  • I think you want a running count, which could be calculated as:

    COUNT(*) OVER (PARTITION BY ui, TRUNC(time) ORDER BY time
                   ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS daily_frequency