Search code examples
postgresqlwindow-functionsnested-select

Calculate lift for every row and compare it with the average lift of region and year


I have a table that looks like the one below:

Shop Year Region Waste Avg Waste (Year,Region) Lift Column_I_want_To_Calculate (apply case when statements) CASE WHEN Lift > Avg(Lift) OVER (PARTITION BY YEAR, REGION) THEN 1 ELSE 0 END
a 2021 CA 10 15 =>(10+20)/2 0.67 => 10/15 0.67 < (0.67+1.34)/2 = 1.005 THEN 0
b 2021 CA 20 15=> (10+20)/2 1.34 => 20/15 1.34 > (0.67+1.34)/2 = 1.005 THEN 1
c 2021 FL 8 8 => 8/1 8/8 8 = 8 THEN 0
d 2020 LA 25 22 => (25+19)/2 0.88 => 25/22 0.88 > (0.88+0.87)/2 = 0.875 THEN 1
e 2020 LA 19 22 => (25+19)/2 0.87 => 19/22 0.87 < (0.88+0.87)/2 = 0.875 THEN 0
f 2019 NY 35 35 35/35 35 = 35 THEN 0

So far I have calculated the columns Shop, Year, Region, Waste, Avg Waste (Year, Region), Lift. I want to calculate the one marked as Column_I_want_To_Calculate.

Briefly, it computes the average lift per Region and Year and compares Shops' Lift with the Average Lift of all shops in the same Region and Year. Then assigns the value 1 or 0 in case of a greater than statement.

So far I have tried (PostgreSQL),

SELECT  shop
        ,year
        ,region
        ,waste
        ,AVG(waste) over (partition by year, region) as "Avg Waste (Year,Region)"
        ,waste/avg(waste) over (partition by year, region) AS Lift,
        ,CASE WHEN waste/avg(waste) over (partition by year, region) > 
           (SELECT tab2.avg_lift 
            FROM (
              SELECT tab1.year, tab1.region, AVG(tab1.lift) OVER (PARTITION BY tab1.year, tab1.region) avg_lift
              FROM (
                  SELECT year, region, waste/ avg(waste) over (partition by year, region) AS lift
                  FROM main_table
                  GROUP BY year,region,waste
                  ORDER BY lift DESC
              ) tab1
              GROUP BY tab1.year, tab1.region, tab1.lift
           ) tab2
        ) THEN 1 ELSE 0 END AS "Column_I_want_To_Calculate"
FROM main_table
GROUP BY shop,
         year,
         nomos,
         waste
;

However, the code above throws the exception

postgresql error: more than one row returned by a subquery used as an expression


Solution

  • This one returns the required output based on your input:

    SELECT  
            region
        ,   shop
        ,   waste
        ,   round(AVG(waste) OVER w,2) AS avg_waste
        ,   round(waste / AVG(waste) OVER w,2) AS lift
        ,   CASE
                WHEN waste > AVG(waste) OVER w THEN 1
                ELSE 0
            END AS above_average
    FROM    i
    WINDOW  w AS (PARTITION BY year, region)
    ORDER BY
        1,2,3;