I have a log file which contains timestamp column. The timestamp is in unix epoch time format.
I want to create a partition based on a timestamp with partitions year, month and day.
So far I have done this but it is throwing an error.
PARSE ERROR cannot recognize input '(' in column type
Here is my code.
from (
from raw_data
MAP ${PREFIX}raw_data.line
USING 's3://scripts/clean.py'
AS (timestamp STRING, name STRING)
) map_out
INSERT OVERWRITE TABLE date_base_data_temp PARTITION(year(timestamp), month(timestamp)), day(timestamp)))
select map_out.name;
Oof, that looks ugly. Try using this function in Hive:
SELECT from_unixtime(unix_timestamp) as new_timestamp from raw_data ...
Or if timestamp is in ms
instead of seconds:
SELECT from_unixtime(unix_timestamp DIV 1000) as new_timestamp from raw_data ...
That converts a unix timestamp into a YYYY-MM-DD HH:MM:SS format, then you can use the following functions to get the year, month, and day:
SELECT year(new_timestamp) as year, month(new_timestamp) as month, day(new_timestamp) as day ...