Search code examples
snowflake-cloud-data-platformsnowflake-schemasnowsql

Nested window function not working in snowflake


I am working on migration of spark sql to snowsql. At one point i got a scenario where i have used nested window functions in spark sql. And i want to migrate that sql query into snowflake. But snowflake doesn't support nested window functions.

Spark sql query -

SELECT 
    *,
    (case when (
        (
            lead(timestamp -lag(timestamp)
                over (partition by session_id order by timestamp))
                over (partition by session_id order by timestamp)
        ) is not null)
     then  
    (
        lead(timestamp -lag(timestamp)
            over (partition by session_id order by timestamp))
            over (partition by session_id order by timestamp)
    ) 
    else 0 end)/1000 as pg_to_pg
FROM dwell_time_step2

Output - spark-sql output I have tried to convert above query into snowflake as below.

Converted Snowsql -

with lagsession as (
SELECT 
    a.*,
    lag(timestamp) over (partition BY session_id   order by timestamp asc) lagsession 
FROM mktg_web_wi.dwell_time_step2 a
)

select 
    a.,
    nvl(lead(a.timestamp - b.lagsession) over (partition BY a.session_id order by   a.timestamp),0)/1000 pg_to_pg
FROM mktg_web_wi.dwell_time_step2 a,
    lagsession b
WHERE a.key=b.key
order by timestamp;

Output -

Snowsql Output

Here, problem is in Snow-sql output. Dwelltime value is getting assigned to different urls.

Expectation is make spark-sql query work on snowsql and output should be same in both cases.

Please let me know if anybody know how this problem can be solved.

Thanks !!


Solution

  • I think that changing this from a nested window function to a cte has changed what records the lag and lead are referring to, but this is tricky to get my head around.

    Regardless, if I'm understanding your code here, I think there's a much simpler approach with only one windows function.

    select 
        a.*,
        (nvl(lead(a.timestamp) over (partition BY a.session_id order by a.timestamp) - a.timestamp)/1000,0) pg_to_pg
    FROM mktg_web_wi.dwell_time_step2 a
    order by timestamp;