Search code examples
sqldatetimehiveimpalaapache-zeppelin

1st Sept coming before 31st Aug in bar chart ordering by date in Zeppelin, how to fix please?


In Zeppelin I have a simple bar chart of value (Y axis) against date (X axis), which was working fine until the new start of this month (today), when it put "1 Sep" before "31 Aug". I am ordering by the date string (as that's the string I need to display on the chart).

Chart image showing most recent dates in the wrong place

Query:

%impala
SELECT FROM_TIMESTAMP(DATE_TRUNC('HOUR', concat(replace(my_timestamp,'"',''), "Z")), 'd MMM HH:mm') AS hours, COUNT(my_number) AS "number per hour"
FROM my_table
WHERE unix_timestamp(my_timestamp) > (unix_timestamp(now()) - 86400)
GROUP BY 1
ORDER BY 1 ASC
LIMIT 24;

I realise the issue is due to the alphanumeric comparison of the date strings. I thought I could fix it by adding a third column for the unix_timestamp() of the date then ordering by that, but that gives a grouping error:

java.sql.SQLException: [Cloudera][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, errorMessage:AnalysisException: select list expression not produced by aggregation output (missing from GROUP BY clause?): unix_timestamp(my_timestamp)

for this query:

%impala
SELECT FROM_TIMESTAMP(DATE_TRUNC('HOUR', concat(replace(my_timestamp,'"',''), "Z")), 'd MMM HH:mm') AS hours, COUNT(my_number) AS "number per hour", unix_timestamp(my_timestamp)
FROM my_table
WHERE unix_timestamp(my_timestamp) > (unix_timestamp(now()) - 86400)
GROUP BY 1
ORDER BY 3 ASC
LIMIT 24;

How to fix please to get chart in correct order?


Solution

  • calculate additional column in yyyy-MM-dd HH:mm format (same granularity as hours but in sortable format) and add it to groupby (before hours column) and to the order by (instead of hours column) :

    SELECT FROM_TIMESTAMP(DATE_TRUNC('HOUR', concat(replace(my_timestamp,'"',''), "Z")), 'd MMM HH:mm') AS hours, 
           FROM_TIMESTAMP(DATE_TRUNC('HOUR', concat(replace(my_timestamp,'"',''), "Z")), 'yyyy-MM-dd HH:mm') as dt,
           COUNT(my_number) AS "number per hour"
    FROM my_table
    WHERE unix_timestamp(my_timestamp) --also it seems Z should be removed, etc 
          > (unix_timestamp(now()) - 86400)
    GROUP BY dt, hours
    ORDER BY dt
    LIMIT 24;