I am trying to understand window functions in Snowflake better. I was pretty fine handling simple ones but I stumbled over something that I don't get.
In below code, the commented out line causes an "is not a valid group by expression" error while the one above with double avg works fine and even returns the right value. I don't understand why the double AVG is required to make this work.
SELECT
dex.seller,
date_trunc('month',dex.CREATED_AT_DATE) AS date_base,
AVG(DATEDIFF('days',dex.CREATED_AT_DATE,dex.firstdate)),
AVG(AVG(DATEDIFF('days',dex.CREATED_AT_DATE,dex.firstdate))) OVER (PARTITION BY dex.seller,date_base)--,
--AVG(DATEDIFF('days',dex.CREATED_AT_DATE,dex.firstdate)) OVER (PARTITION BY dex.seller,date_base)
FROM
REP_DATA_SOURCES.DEAL_EXTENDED dex
GROUP BY 1,2
This is a simplification of the actual code.
It is all about logical order of execution.
Qualify In the execution order of a query, QUALIFY is therefore evaluated after window functions are computed. Typically, a SELECT statement’s clauses are evaluated in the order shown below:
- From
- Where
- Group by
- Having
- Window
- QUALIFY
- Distinct
- Order by
- Limit
Window/analytical functions are evaluated after GROUP BY.
Sceanrio 1: Correct
SELECT
dex.seller,
date_trunc('month',dex.CREATED_AT_DATE) AS date_base,
AVG(DATEDIFF('days',dex.CREATED_AT_DATE,dex.firstdate)),
AVG(AVG(DATEDIFF('days',dex.CREATED_AT_DATE,dex.firstdate)))
OVER (PARTITION BY dex.seller,date_base)--,
FROM REP_DATA_SOURCES.DEAL_EXTENDED dex
GROUP BY 1,2;
Scenario 2: Incorrect (analytical average over value that is not part of group by or wrapped with aggregate function)
SELECT
dex.seller,
date_trunc('month',dex.CREATED_AT_DATE) AS date_base,
AVG(DATEDIFF('days',dex.CREATED_AT_DATE,dex.firstdate)),
AVG(DATEDIFF('days',dex.CREATED_AT_DATE,dex.firstdate)) OVER (PARTITION BY dex.seller,date_base)
FROM REP_DATA_SOURCES.DEAL_EXTENDED dex
GROUP BY 1,2;
If we skip the analitical part of query, it is incorrect itself at the GROUP BY
level:
SELECT
dex.seller,
date_trunc('month',dex.CREATED_AT_DATE) AS date_base,
AVG(DATEDIFF('days',dex.CREATED_AT_DATE,dex.firstdate)),
DATEDIFF('days',dex.CREATED_AT_DATE,dex.firstdate)
FROM REP_DATA_SOURCES.DEAL_EXTENDED dex
GROUP BY 1,2;
Expression DATEDIFF('days',dex.CREATED_AT_DATE,dex.firstdate)
is not: