Search code examples
sqldatabaseimpala

Impala: change the column type prior to perform the aggregation function for group by


I have a table, my_table:

transaction_id    |   money     |  team
--------------------------------------------
    1             |   10        |   A
    2             |   20        |   B
    3             |   null      |   A
    4             |   30        |   A
    5             |   16        |   B
    6             |   12        |   B

When I group by team, I can compute max, min through query:

select team, max(money), min(money) from my_table group by team

However, I can't do avg and sum because there is null. i.e:

select team, avg(money), sum(money) from my_table group by team

would fail.

Is there a way to change the column type prior to computing the avg and sum? i.e. I want the output to be:

team   |  avg(money)   |  sum(money)
--------------------------------------
 A     |  20           |  40
 B     |  16           |  48

Thanks!


Solution

  • Per documentation provided by Cloudera your query should be working as-is. Both AVG Function and SUM Function ignore null.

    SELECT team, AVG(money), SUM(money)
    FROM my_table
    GROUP BY team
    

    UPDATE: Per your comment, again I'm not familiar with Impala. Presumably standard SQL will work. Your error appears to be a datatype issue.

    SELECT team, AVG(CAST(money AS INT)), SUM(CAST(money AS INT))
    FROM my_table
    GROUP BY team