Search code examples
sqlselectsubqueryimpala

(Impala) Selecting most common value in field results in "Subqueries are not supported in select list"


I am trying to do an aggregation that takes the most common value of the group, like this:

with t1 as (
    select
        id
        , colA
        , colB
    from some_Table
)
select 
    id
    , count(*) as total
    , max(colA) as maxColA
    , most_common(colB) -- this is what I'm trying to achieve
from t1
group by id

This is what I have tried to do:

with t1 as (
    select
        id
        , colA
        , colB
    from some_Table
)
select 
    id
    , count(*) as total
    , max(colA) as maxColA
    , (select colB, count(colB) as counts from t1 group by colB order by counts desc limit 1) as most_freq_colB_per_id
from t1
group by id

However, it tells me AnalysisException: Subqueries are not supported in the select list. How else can I do this?


Solution

  • Impala does not -- as far as I know -- have a built-in aggregation function to calculate the mode (the statistical name of what you are trying to calculate).

    You can use two levels of aggregation. Your CTE isn't doing anything, so you can do:

    select id, sum(total) as total, max(maxColA) as maxColA,
           max(case when seqnum = 1 then colB end) as mode
    from (select id, colB, count(*) as total, max(colA) as maxColA,
                 row_number() over (partition by id order by count(*) desc) as seqnum
          from sometable
          group by id, colb
         ) t
    group by id;