Search code examples
hadoophivehqlhiveql

How to take average(mean) of timestamp('yyyy-mm-dd hh:mm:ss') in hive?


I have a logs table. look like this:-

  user_name               idle_hours            working_hours       start_time       stop_time
[email protected]     2019-10-24 05:05:00  2019-10-24 05:50:00  2019-10-24 08:30:02  2019-10-24 19:25:02
[email protected]      2019-10-24 02:15:00  2019-10-24 08:39:59  2019-10-24 08:30:02  2019-10-24 19:25:01
[email protected]     2019-10-24 01:30:00  2019-10-24 09:24:59  2019-10-24 08:30:02  2019-10-24 19:25:01
[email protected]     2019-10-24 00:30:00  2019-10-24 09:10:01  2019-10-24 08:45:01  2019-10-24 18:25:02
[email protected]    2019-10-24 03:15:00  2019-10-24 07:19:59  2019-10-24 08:50:02  2019-10-24 19:25:01
[email protected]  2019-10-24 01:55:00  2019-10-24 08:40:00  2019-10-24 08:50:01  2019-10-24 19:25:01
[email protected]  2019-10-24 00:35:00  2019-10-24 09:55:00  2019-10-24 08:55:01  2019-10-24 19:25:01
[email protected]        2019-10-24 02:35:00  2019-10-24 08:04:59  2019-10-24 08:45:02  2019-10-24 19:25:01
[email protected]   2019-10-24 01:10:00  2019-10-24 08:39:59  2019-10-24 09:00:02  2019-10-24 18:50:01

I want to find the average working hours.

select * from workinglogs where unix_timestamp(working_hours) < AVG(unix_timestamp(working_hours));

when I run this query it's not working.

Error showing:- FAILED: SemanticException [Error 10128]: Line 1:64 Not yet supported place for UDAF 'AVG'


Solution

  • You could follow this approach

    subquery to calculate the AVG and query to filter the output

    as an example with your data

    +------------------------+-------------------------+----------------------------+-------------------------+------------------------+--+
    | workinglogs.user_name  | workinglogs.idle_hours  | workinglogs.working_hours  | workinglogs.start_time  | workinglogs.stop_time  |
    +------------------------+-------------------------+----------------------------+-------------------------+------------------------+--+
    | [email protected]      | 2019-10-24 02:15:00.0   | 2019-10-24 08:39:59.0      | 2019-10-24 08:30:02.0   | 2019-10-24 19:25:01.0  |
    | [email protected]     | 2019-10-24 01:30:00.0   | 2019-10-24 09:24:59.0      | 2019-10-24 08:30:02.0   | 2019-10-24 19:25:01.0  |
    | [email protected]     | 2019-10-24 00:30:00.0   | 2019-10-24 09:10:01.0      | 2019-10-24 08:45:01.0   | 2019-10-24 18:25:02.0  |
    | [email protected]    | 2019-10-24 03:15:00.0   | 2019-10-24 07:19:59.0      | 2019-10-24 08:50:02.0   | 2019-10-24 19:25:01.0  |
    | [email protected]  | 2019-10-24 01:55:00.0   | 2019-10-24 08:40:00.0      | 2019-10-24 08:50:01.0   | 2019-10-24 19:25:01.0  |
    | [email protected]  | 2019-10-24 00:35:00.0   | 2019-10-24 09:55:00.0      | 2019-10-24 08:55:01.0   | 2019-10-24 19:25:01.0  |
    | [email protected]        | 2019-10-24 02:35:00.0   | 2019-10-24 08:04:59.0      | 2019-10-24 08:45:02.0   | 2019-10-24 19:25:01.0  |
    | [email protected]   | 2019-10-24 01:10:00.0   | 2019-10-24 08:39:59.0      | 2019-10-24 09:00:02.0   | 2019-10-24 18:50:01.0  |
    +------------------------+-------------------------+----------------------------+-------------------------+------------------------+--+
    

    query with subquery

    WITH t AS(
    SELECT ROUND(AVG(unix_timestamp(working_hours)),2) as average
    FROM workinglogs)
    SELECT w.user_name,w.idle_hours,w.working_hours,w.start_time,w.stop_time 
    FROM workinglogs AS w,t
    WHERE unix_timestamp(w.working_hours) < t.average;
    

    output

    +------------------------+------------------------+------------------------+------------------------+------------------------+--+
    |      w.user_name       |      w.idle_hours      |    w.working_hours     |      w.start_time      |      w.stop_time       |
    +------------------------+------------------------+------------------------+------------------------+------------------------+--+
    | [email protected]      | 2019-10-24 02:15:00.0  | 2019-10-24 08:39:59.0  | 2019-10-24 08:30:02.0  | 2019-10-24 19:25:01.0  |
    | [email protected]    | 2019-10-24 03:15:00.0  | 2019-10-24 07:19:59.0  | 2019-10-24 08:50:02.0  | 2019-10-24 19:25:01.0  |
    | [email protected]  | 2019-10-24 01:55:00.0  | 2019-10-24 08:40:00.0  | 2019-10-24 08:50:01.0  | 2019-10-24 19:25:01.0  |
    | [email protected]        | 2019-10-24 02:35:00.0  | 2019-10-24 08:04:59.0  | 2019-10-24 08:45:02.0  | 2019-10-24 19:25:01.0  |
    | [email protected]   | 2019-10-24 01:10:00.0  | 2019-10-24 08:39:59.0  | 2019-10-24 09:00:02.0  | 2019-10-24 18:50:01.0  |
    +------------------------+------------------------+------------------------+------------------------+------------------------+--+