Search code examples
sqldatetimegreatest-n-per-groupsnowflake-cloud-data-platformwindow-functions

Snowflake window function last_value and max


I have a table like this :

enter image description here

I want to get for each user_key the top total_listened per date / month / weekday / week

I think i need to use window functions i can get the different date format with :

MONTH(stream_date) for months
WEEKDAY(stream_date) for weekday
WEEK(stream_date) for week

i tried with this :

select 
MAX(vals.total_listened_per_day) as RECORD_STREAM_DAY_TIME,
MAX(vals.total_listened_per_month) as RECORD_STREAM_MONTH_TIME,
MAX(vals.total_listened_per_week) as RECORD_STREAM_WEEK_TIME,
MAX(vals.most_active_weekday) as MOST_ACTIVE_WEEKDAY_TIME
 last_value(days.date) over (partition by user_key order by days.total_listened) as RECORD_STREAMDAY,
from
(
select user_key, stream_date as date,
sum(st.length_listened) over (partition by user_key, stream_date) as total_listened_per_day,
sum(st.length_listened) over (partition by user_key, MONTH(stream_date)) as total_listened_per_month,
sum(st.length_listened) over (partition by user_key, WEEK(stream_date)) as total_listened_per_week,
sum(st.length_listened) over (partition by user_key, DAYNAME(stream_date)) as most_active_weekday
group by 1,2
 .....
 )

which work for getting the amount (variables with _TIME as the end) , but it doesn't work for getting the specific day / month .... (the variables without _TIME at the end such as RECORD_STREAMDAY) , it's because of the group by , its grouped by stream_date and not month(stream_date) for example , i don't know how i can do it without doin subqueries for each


Solution

  • I think the logic you want is:

    select user_key,
        max(total_listened_per_day  ) as max_total_listened_per_day
        max(total_listened_per_week ) as max_total_listened_per_week,
        max(total_listened_per_month) as max_total_listened_per_month,
        max(case when rn_day   = 1 then date_trunc('day',   stream_date) end) as most_active_day,
        max(case when rn_week  = 1 then date_trunc('week',  stream_date) end) as most_active_week,
        max(case when rn_month = 1 then date_trunc('month', stream_date) end) as most_active_month
    from (  
        select t.*,
            rank() over(partition by user_key order by total_listened_per_day   desc) as rn_day,
            rank() over(partition by user_key order by total_listened_per_week  desc) as rn_week,
            rank() over(partition by user_key order by total_listened_per_month desc) as rn_month
        from (
            select t.*
                sum(st.length_listened) over (partition by user_key, date_trunc('day',   stream_date)) as total_listened_per_day,
                sum(st.length_listened) over (partition by user_key, date_trunc('week',  stream_date)) as total_listened_per_week
                sum(st.length_listened) over (partition by user_key, date_trunc('month', stream_date)) as total_listened_per_month
            from mytable t
        ) t
    ) t
    group by user_key
    

    The most inner subquery computes the window sum of listening time per day, week and month. The next subquery ranks records using that information. Finally, the outer query uses conditional aggregation to bring the corresponding durations and periods. If there are ties, the most recent period is picked.