Search code examples
hiveimpala

Impala mathematical operation containing avg fails with AnalysisException


I am attempting to subtract a value in a column (column_18) from the average of another column (avg(column_19)) and obtain this result as a third column (result) for each row of the table:

cur.execute("Select avg(column_19) - column_18 as result FROM test1")

This doesn't seem to be working well, and I get this error:

impala.error.HiveServer2Error: AnalysisException: select list expression not produced by aggregation output (missing from GROUP BY clause?): SUM(column_19) / COUNT(column_19) - column_18

I do not want the result to be grouped


Solution

  • avg() in this context is an aggregate function, which means that it is applied to a group of rows, which may specified with a GROUP BY clause (or all rows if not specified). The output of an aggregate expression is a single value per-group, so it is not applied per-row as you want.

    However, you can accomplish what you're trying to do in a few ways. I think the easiest is by using avg() as an analytic function. For example, you can do something like:

    select column_19, column_18, (avg(column_19) over () - column_18) as result from test1
    

    See the documentation for more details about how aggregations and analytic functions work.