I want to calculate the means and medians of the column3, separately for different categories of column1 and column2. So basically I would like to calculate the means and medians grouped by column1 and column2.
The data looks something like this:
Table `xx.yy.zz`
column1 column2 column3
A A1 1
A A1 2
A A1 3
B B2 10
B B2 15
B B2 20
...
The desired output would be:
column1 column2 median3 mean3
A A1 2 2
A A2 median mean
A A3 median mean
B B1 median mean
B B2 15 15
C C1 median mean
I tried with the code below. The first part of the code (table1) does the right work while both trials for the second part (table2) don't work. What I am doing wrong? What would be the correct way to calculate the median of the column3 group by column1 and column2?
My code so far:
WITH
table1 AS (SELECT column1, column2,
AVG(column3) AS mean3
FROM xx.yy.zz
GROUP BY 1,2
),
table2 AS (SELECT column1, column2,
PERCENTILE_CONT(column3, 0.5) OVER(PARTITION BY column1, column2,) AS median3
FROM xx.yy.zz
group by 1,2
),
**OR**
table2 AS (SELECT
PERCENTILE_CONT(column3, 0.5) OVER(PARTITION BY column1, column2,) AS median3
FROM xx.yy.zz
),
table3 AS (SELECT * FROM table1
INNER JOIN
(SELECT * FROM table2)
USING(column1, column2)
)
SELECT * FROM table3
Below is for BigQuery Standard SQL
#standardsql
create temp function median (arr any type) as (
if(mod(array_length(arr), 2) = 0,
( arr[offset(div(array_length(arr), 2) - 1)] +
arr[offset(div(array_length(arr), 2))]) / 2,
arr[offset(div(array_length(arr), 2))] )
);
select column1, column2,
median(array_agg(column3 order by column3)) as median3,
avg(column3) as mean3
from `xx.yy.zz`
group by column1, column2
if to apply to sample data from your question - output is