Search code examples
hiveclouderasqoopimpalacloudera-quickstart-vm

TIMESTAMP on HIVE table


I'm trying to load data from Oracle to Hive as parquet. Every time i load a table with date/timestamp column to hive, it automatically converts these columns to BIGINT. Is is possible to load timestamp/date formats to hive using sqoop and as a parquet file?

Already tried creating the table first in hive then using impala to LOAD DATA INPATH the parquet file.
Still failed with errors

"file XX has an incompatible Parquet schema for column XX column: TIMESTAMP"

BTW, I'm using cloudera quickstart vm. Thanks


Solution

  • From the Cloudera documentation:

    If you use Sqoop to convert RDBMS data to Parquet, be careful with interpreting any resulting values from DATE, DATETIME, or TIMESTAMP columns. The underlying values are represented as the Parquet INT64 type, which is represented as BIGINT in the Impala table. The Parquet values represent the time in milliseconds, while Impala interprets BIGINT as the time in seconds. Therefore, if you have a BIGINT column in a Parquet table that was imported this way from Sqoop, divide the values by 1000 when interpreting as the TIMESTAMP type.

    Or you can also use your Hive query like this to get the result in your desired TIMESTAMP format.

    FROM_UNIXTIME(CAST(SUBSTR(timestamp_column, 1,10) AS INT)) AS timestamp_column;