Search code examples
sqlhivetimestampimpala

get count all with groupby timestamp into hourly intervals


I have a hive table that has a timestamp in string format as below,

20190516093836, 20190304125015, 20181115101358

I want to get row count with an aggregate timestamp into hourly as below

date_time               count
-----------------------------
2019:05:16: 00:00:00    23
2019:05:16: 01:00:00    64

I followed several links like this but was unable to generate the desired results yet.

This is my final query:

SELECT 
    DATE_PART('day', b.date_time) AS date_prt, 
    DATE_PART('hour', b.date_time) AS hour_prt, 
    COUNT(*)   
FROM
    (SELECT 
         from_unixtime(unix_timestamp(`timestamp`, "yyyyMMddHHmmss")) AS date_time 
     FROM table_name
     WHERE from_unixtime(unix_timestamp(`timestamp`, "yyyyMMddHHmmss")) 
           BETWEEN '2018-12-10 07:02:30' AND '2018-12-12 08:02:30') b
GROUP BY
    date_prt, hour_prt

I hope for some guidance from you, thanks in advance


Solution

  • You can extract date_time already in required format 'yyyy-MM-dd HH:00:00'. I prefer using regexp_replace:

    SELECT 
        date_time, 
        COUNT(*) as `count`
    FROM
        (SELECT 
             regexp_replace(`timestamp`, '^(\\d{4})(\\d{2})(\\d{2})(\\d{2})(\\d{2})(\\d{2})$','$1-$2-$3 $4:00:00') AS date_time 
         FROM table_name
         WHERE regexp_replace(`timestamp`, '^(\\d{4})(\\d{2})(\\d{2})(\\d{2})(\\d{2})(\\d{2})$','$1-$2-$3 $4:$5:$6')
               BETWEEN '2018-12-10 07:02:30' AND '2018-12-12 08:02:30') b
    GROUP BY
        date_time
    

    This will also work:

    from_unixtime(unix_timestamp('20190516093836', "yyyyMMddHHmmss"),'yyyy-MM-dd HH:00:00') AS date_time