I have this table (myt):
CREATE TABLE myt (
name VARCHAR(50),
food VARCHAR(50),
d1 INT
);
INSERT INTO myt (name, food, d1) VALUES
('john', 'pizza', 2010),
('john', 'pizza', 2011),
('john', 'cake', 2012),
('tim', 'apples', 2015),
('david', 'apples', 2020),
('david', 'apples', 2021),
('alex', 'cookies', 2005),
('alex', 'cookies', 2006);
name food d1 food_year
john pizza 2010 2010
john pizza 2011 2011
john cake 2012 2012
tim apples 2015 2015
david apples 2020 2020
david apples 2021 2021
alex cookies 2005 2005
alex cookies 2006 2006
I wrote the following query to find out the percentage breakdowns of each food per name:
WITH FoodCounts AS (
SELECT name,
food,
COUNT(*) as food_count
FROM myt
GROUP BY name, food
),
TotalCounts AS (
SELECT name,
COUNT(*) as total_count
FROM myt
GROUP BY name
)
SELECT fc.name,
fc.food,
(fc.food_count * 100.0) / tc.total_count as percentage
FROM FoodCounts fc
JOIN TotalCounts tc
ON fc.name = tc.name;
name food percentage
alex cookies 100.00000
david apples 100.00000
john cake 33.33333
john pizza 66.66667
tim apples 100.00000
I am now trying to modify this query to find out the cumulative percentages. For example, as of 2011 - what was John's food breakdown? As of 2012 what was John's food breakdown?
I tried to write a series of CTEs using window functions to answer this:
WITH YearlyFoodCounts AS (
SELECT name,
food,
food_year,
COUNT(*) as food_count
FROM myt
GROUP BY name, food, food_year
),
CumulativeCounts AS (
SELECT name,
food_year,
SUM(food_count) OVER (PARTITION BY name ORDER BY food_year) as cumulative_count
FROM YearlyFoodCounts
)
SELECT yfc.name,
yfc.food,
yfc.food_year,
yfc.food_count,
cc.cumulative_count,
(yfc.food_count * 100.0) / cc.cumulative_count as percentage
FROM YearlyFoodCounts yfc
JOIN CumulativeCounts cc
ON yfc.name = cc.name AND yfc.food_year = cc.food_year
ORDER BY yfc.name, yfc.food_year;
The result seems to be in the correct format:
name food food_year food_count cumulative_count percentage
alex cookies 2005 1 1 100.00000
alex cookies 2006 1 2 50.00000
david apples 2020 1 1 100.00000
david apples 2021 1 2 50.00000
john pizza 2010 1 1 100.00000
john pizza 2011 1 2 50.00000
john cake 2012 1 3 33.33333
tim apples 2015 1 1 100.00000
Is this the correct way to approach this problem?
You are over-complicating this. It doesn't need joins or subqueries, you can do it in a single level using window functions. You can put a normal aggregate inside a window function, as window functions run after normal aggregation.
Note:
ROWS UNBOUNDED PRECEDING
because the default is RANGE UNBOUNDED PRECEDING
which is subtly different.food
per name, food_year
pair. So you should aggregate by only those two columns.SELECT
name,
MIN(food) AS food,
food_year,
COUNT(*) as food_count,
SUM(COUNT(*)) OVER (PARTITION BY name ORDER BY food_year ROWS UNBOUNDED PRECEDING) as cumulative_count,
COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (PARTITION BY name ORDER BY food_year ROWS UNBOUNDED PRECEDING) as percentage
FROM myt
GROUP BY
name,
food_year;