I have a hive table that has a timestamp in string format as below,
20190516093836, 20190304125015, 20181115101358
I want to get row count with an aggregate timestamp into hourly as below
date_time count
-----------------------------
2019:05:16: 00:00:00 23
2019:05:16: 01:00:00 64
I followed several links like this but was unable to generate the desired results yet.
This is my final query:
SELECT
DATE_PART('day', b.date_time) AS date_prt,
DATE_PART('hour', b.date_time) AS hour_prt,
COUNT(*)
FROM
(SELECT
from_unixtime(unix_timestamp(`timestamp`, "yyyyMMddHHmmss")) AS date_time
FROM table_name
WHERE from_unixtime(unix_timestamp(`timestamp`, "yyyyMMddHHmmss"))
BETWEEN '2018-12-10 07:02:30' AND '2018-12-12 08:02:30') b
GROUP BY
date_prt, hour_prt
I hope for some guidance from you, thanks in advance
You can extract date_time already in required format 'yyyy-MM-dd HH:00:00'
. I prefer using regexp_replace:
SELECT
date_time,
COUNT(*) as `count`
FROM
(SELECT
regexp_replace(`timestamp`, '^(\\d{4})(\\d{2})(\\d{2})(\\d{2})(\\d{2})(\\d{2})$','$1-$2-$3 $4:00:00') AS date_time
FROM table_name
WHERE regexp_replace(`timestamp`, '^(\\d{4})(\\d{2})(\\d{2})(\\d{2})(\\d{2})(\\d{2})$','$1-$2-$3 $4:$5:$6')
BETWEEN '2018-12-10 07:02:30' AND '2018-12-12 08:02:30') b
GROUP BY
date_time
This will also work:
from_unixtime(unix_timestamp('20190516093836', "yyyyMMddHHmmss"),'yyyy-MM-dd HH:00:00') AS date_time