Search code examples
sqlwindow-functionsclickhouse

clickhouse window function difficulties - hoew to work with date windows


I have web sessions with utm tags (different channels of traffic: cpc, smm, push). Some of them with tags but some sessions from organic without utm tags. I want to overwrite organic sessions to previous tags

Rules, which I want to use:

  • push channel remains only for the session in which it is registered
  • all other non-empty channels are forwarded to all empty sessions for the current and next day.
  • Channels are not overwritten - that is, if at first there was a cpc channel, and then on the same day there was an smm channel, then cpc sessions go first, and then smm for the current and next day. clickhouse version 22.8.10.29 Example what I want to get

Solution

  • Main Idea use arrays with union all for push channel

        select install_id, session_id, date_uz , started_at,  utm_medium, utm_medium_final
    from (
        SELECT *, arrayFirst(x -> x!='', arrayReverse(utm_medium_array)) as utm_medium_new,
        maxIf(date_uz, utm_medium_new = utm_medium) OVER (PARTITION BY install_id  ORDER BY started_at  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as last_date,
        if(date_uz - last_date < 2, utm_medium_new, '') utm_medium_final
        --any(utm_medium_new) OVER (PARTITION BY install_id  ORDER BY started_at  ROWS BETWEEN 1 PRECEDING AND  1 PRECEDING ) as h,
        from (
            select install_id, session_id, utm_medium, date_uz , started_at,  
            groupArray(utm_medium) OVER (PARTITION BY install_id  ORDER BY started_at  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS utm_medium_array  
            from marketing.sessions_with_attribution swa 
            where date_uz >=today()-50 
            and utm_medium!='push'
            and install_id  in ('1cc69a1f-eb17-4be6-8bfc-a5dee2dd9c50','57927c21-e862-4729-b38e-f663aa9d227d')
        )
        union ALL 
        select install_id, session_id, utm_medium, date_uz , started_at,  
        [] utm_medium_array,   utm_medium , null, utm_medium
        from marketing.sessions_with_attribution swa 
        where date_uz >=today()-50 
        and utm_medium = 'push'
        and install_id  in ('1cc69a1f-eb17-4be6-8bfc-a5dee2dd9c50','57927c21-e862-4729-b38e-f663aa9d227d')
    )
    order by install_id, started_at