I have this query that it's supposed to take first n rows of grouped data. I used RANK()
in combination with OVER PARTITION BY
to identify top n rows per group:
SELECT X.USERID, X.ARTID, X.AVGTIMEONPAGE,EDP.ARTDSC,
RANK() OVER (PARTITION BY X.USERID ORDER BY X.AVGTIMEONPAGE DESC) as rank
FROM
(SELECT GANG.userID AS USERID,GANG.avgTimeOnPage AS AVGTIMEONPAGE,
split_part(GANG.pageTitle,' -',1) as ARTID
FROM GoogleAnalytics.navigazioneG AS GANG
WHERE GANG.pagePath LIKE '%DataSheets%' ) AS X
LEFT JOIN ESPDDS.ESP_DPRODUCT AS EDP
ON EDP.ARTID=X.ARTID AND EDP.SCD_IS_CURRENT=1
AND EDP.COMPANYID=1
WHERE X.ARTID NOT LIKE '%Company%' AND rank in (1,2,3,4,5)
it gives me an error saying that rank column doesn't exist. If I comment the last part of WHERE Clause I can see that column rank is computed correctly.
Thanks
The WHERE
clause gets evaluated before the SELECT
clause. So at that time rank
is unknown. You can use a further subquery to access it:
SELECT *
FROM
(
SELECT
X.USERID,
X.ARTID,
X.AVGTIMEONPAGE,
EDP.ARTDSC,
RANK() OVER (PARTITION BY X.USERID ORDER BY X.AVGTIMEONPAGE DESC) as rank
FROM
(
SELECT
GANG.userID AS USERID,
GANG.avgTimeOnPage AS AVGTIMEONPAGE,
split_part(GANG.pageTitle,' -',1) as ARTID
FROM GoogleAnalytics.navigazioneG AS GANG
WHERE GANG.pagePath LIKE '%DataSheets%'
) AS X
LEFT JOIN ESPDDS.ESP_DPRODUCT AS EDP ON EDP.ARTID = X.ARTID
AND EDP.SCD_IS_CURRENT = 1
AND EDP.COMPANYID = 1
WHERE X.ARTID NOT LIKE '%Company%'
) ranked
WHERE rank in (1,2,3,4,5);