Search code examples
sqlclouderaimpalacloudera-cdh

ORDER BY with GROUP BY in Impala SQL


As a research project I've decided to use Cloudera Impala by setting up a whole CDH5 environment. Then I've decided to play with querying data.For some reason a simple ORDER BY does not work for a GROUP BY statement using in Impala SQL. Is this feature supported in Impala?

Here is how my query looks like without sorting:

SELECT TO_DATE(time) AS dt
FROM wearable_data
GROUP BY dt 

Result:

0   2014-01-01
1   2014-07-15
2   2014-07-20
3   2014-07-17

Now the following query does not work:

SELECT TO_DATE(time) AS dt
FROM wearable_data
GROUP BY dt
ORDER BY dt 
-- ORDER BY 1

Result:

Query 6e4da94e0c586e34:7077273d6337e893 100% Complete (23 out of 23)

Explain shows following:

Estimated Per-Host Requirements: Memory=256.00MB VCores=2
WARNING: The following tables are missing relevant table and/or column statistics.
default.wearable_data

04:EXCHANGE [PARTITION=UNPARTITIONED]
|
03:AGGREGATE [MERGE FINALIZE]
|  group by: to_date(time)
|
02:EXCHANGE [PARTITION=HASH(to_date(time))]
|
01:AGGREGATE
|  group by: to_date(time)
|
00:SCAN HDFS [default.wearable_data]
   partitions=1/1 size=1.44KB

Any thoughts on this?


Solution

  • I think this is your issue:

    " Prior to Impala 1.4.0, Impala required that queries using an ORDER BY clause also include a LIMIT clause" Reference here