Search code examples
sqlhdfsimpala

Impala: all DISTINCT aggregate functions need to have the same set of parameters


I got the following error in my Impala query:

select 
   upload_key, 
   max(my_timestamp) as upload_time, 
   max(color_key) as max_color_fk, 
   count(distinct color_key) as color_count, 
   count(distinct id) as toy_count 
from upload_table 
group by upload_key

and got the error:

AnalysisException: all DISTINCT aggregate functions need to have the same set of parameters as count(DISTINCT color_key); deviating function: count(DISTINCT id)

I am not sure why I got this error. What I did was for each group (grouped by upload_key), I tried to compute how many distinct id as well as how many distinct color_key.
Does any one have any idea


Solution

  • The error message indicates that DISTINCT is only allowed on one column [combination], but you try two, color_key& id. A workaround would be two Selects and then a join:

    select
       t1.upload_key,
       t1.upload_time,
       t1.max_color_fk,
       t1.color_count,
       t2.toy_count
    from
     (
       select 
          upload_key, 
          max(my_timestamp) as upload_time, 
          max(color_key) as max_color_fk, 
          count(distinct color_key) as color_count
       from upload_table 
       group by upload_key
     ) as t1
    join
     (
       select 
          upload_key
          count(distinct id) as toy_count 
       from upload_table 
       group by upload_key
     ) as t2
    on t1. upload_key = t2.upload_key