Search code examples
mysqlsqlhadoopselectimpala

Equivalent of FIRST() and LAST() in Impala


My data from different Sensor units is comming with unix_time stam in Hadoop. It is quite easy to get the first and last value of each day in SQL using the FIRST and LAST statement, as given below. Here I am getting the temperature at the start and at the end of the day.

SELECT unit, FIRST(Temp) as Start_Day_Value, LAST(Temp) as End_Day_Value
FROM Sensor_Data.Table
WHERE cast(ts/1000 as TIMESTAMP) BETWEEN "2021-01-05 00:00:00" AND "2021-01-05 23:59:59"
GROUP BY unit
ORDER BY unit;

Is there any equivalent command in the IMPALA so that I can get the first and last value of the each column as am getting in SQL.


Solution

  • I solved the problem in the following way:

    SELECT DISTINCT u.unit, u.initial_temp, u.final_temp, u.initial_p, u.final_p
    FROM(SELECT unit,
    first_value(temperature) OVER(PARTITION BY unit ORDER BY ts DESC) as final_temp,
    first_value(temperature) OVER(PARTITION BY unit ORDER BY ts ASC) as initial_temp,
    first_value(pressure) OVER(PARTITION BY unit ORDER BY ts DESC) as final_p,
    first_value(pressure) OVER(PARTITION BY unit ORDER BY ts ASC) as initial_p
    FROM Sensor_Data.Table
    WHERE cast(ts/1000 as TIMESTAMP) BETWEEN "2020-12-11 00:00:00" AND "2020-12-11 23:59:59"
    ) AS u
    ORDER BY unit;