I'm trying to figure out how to display the aggregation value of the prior sibling's parent grouping using SQL. Hope the following example will make this clearer:
Given this table:
Region | Department | Cost |
---|---|---|
East | Clothing | 45 |
East | Clothing | 35 |
East | Electronics | 120 |
South | Clothing | 20 |
South | Clothing | 25 |
West | Clothing | 40 |
West | Electronics | 150 |
West | Electronics | 140 |
I would like to create a query which will return the following result-set:
Region | Department | Sum of Cost | Prev Region Sum |
---|---|---|---|
East | Clothing | 80 | NULL |
East | Electronics | 120 | NULL |
South | Clothing | 45 | 200 |
West | Clothing | 40 | 45 |
West | Electronics | 290 | 45 |
Basically, I want the data grouped and aggregated by both Region and Department, but I would also like to have reference to the aggregation based on the Region grouping for the prior Region (given they are alphabetically sorted).
So if you look at the "Prev Region Sum" column in the result set - you can see the first 2 East rows get null
, since there is no region prior to "East", the next South row gets 200 since this is the sum of the cost of all the "East" records, and the "West" rows get 45 since this is the sum of the cost for all the South records.
Here is the SQL for this example in case you want to play around with it. As you can see, I have everything apart from the expression for the "Prev Region Sum" column:
CREATE TABLE so_sales (
Region VARCHAR,
Department VARCHAR,
Cost INTEGER
);
INSERT INTO so_sales (Region, Department, Cost)
VALUES
('East', 'Clothing', 45),
('East', 'Clothing', 35),
('East', 'Electronics', 120),
('South', 'Clothing', 20),
('South', 'Clothing', 25),
('West', 'Clothing', 40),
('West', 'Electronics', 150),
('West', 'Electronics', 140);
SELECT
Region,
Department,
SUM(Cost) AS "Sum of Cost",
LAG(SUM(Cost)) OVER (PARTITION BY Region ORDER BY Region, Department) AS "Prev Region Sum"
FROM
so_sales
GROUP BY
Region,
Department
ORDER BY
Region,
Department;
with data(Region, Department, Cost) as (
select 'East', 'Clothing', 45 union all
select 'East', 'Clothing', 35 union all
select 'East', 'Electronics', 120 union all
select 'South', 'Clothing', 20 union all
select 'South', 'Clothing', 25 union all
select 'West', 'Clothing', 40 union all
select 'West', 'Electronics', 150 union all
select 'West', 'Electronics', 140
)
select region, Department, sum_of_cost, region_sum,
lag(region_sum, rn::int) over(order by region, rn) as prev_region_sum
from (
select region, Department, sum_of_cost,
sum(sum_of_cost) over(partition by region order by region) as region_sum,
row_number() over(partition by region order by department) as rn
from (
select
region, Department,
sum(cost) as sum_of_cost
from data
group by Region, Department
) d
) d
order by region, department
;