I am trying to do an aggregation that takes the most common value of the group, like this:
with t1 as (
select
id
, colA
, colB
from some_Table
)
select
id
, count(*) as total
, max(colA) as maxColA
, most_common(colB) -- this is what I'm trying to achieve
from t1
group by id
This is what I have tried to do:
with t1 as (
select
id
, colA
, colB
from some_Table
)
select
id
, count(*) as total
, max(colA) as maxColA
, (select colB, count(colB) as counts from t1 group by colB order by counts desc limit 1) as most_freq_colB_per_id
from t1
group by id
However, it tells me AnalysisException: Subqueries are not supported in the select list
. How else can I do this?
Impala does not -- as far as I know -- have a built-in aggregation function to calculate the mode (the statistical name of what you are trying to calculate).
You can use two levels of aggregation. Your CTE isn't doing anything, so you can do:
select id, sum(total) as total, max(maxColA) as maxColA,
max(case when seqnum = 1 then colB end) as mode
from (select id, colB, count(*) as total, max(colA) as maxColA,
row_number() over (partition by id order by count(*) desc) as seqnum
from sometable
group by id, colb
) t
group by id;