Search code examples
hivetimestamporc

What is the supported range for Hive's Timestamp data types


  • Hadoop Distro: HDP 2.4.2
  • Hive Version: 1.2.1

I have an ORC table tbl1 having timestamp column. I inserted these values:

insert into tbl1 values ('0001-01-01 00:00:20.0');
insert into tbl1 values ('9999-01-01 00:00:20.0');

I created another ORC table tbl2 having timestamp column. Then I tried:

insert into tbl2 select * from tbl1;

Data in tbl2 is corrupted:

1754-08-28 22:44:01.128654848
1815-03-31 05:56:28.066277376
  • Am I using out of range values?
  • Is there any specific range for timestamp?

This problem is only for ORC file format. For other file formats like textfile, avro, parquet, etc it's working fine.


Solution

  • Hive vectorization is enabled in my case.

    hive.vectorized.execution.enabled = true;
    

    This is an issue due to vectorized execution.

    It is fixed in 1.3.0, 2.0.1, 2.1.0

    Check HIVE-9862 and wiki for more details.

    To use vectorized query execution, you must store your data in ORC format. That's why the error is coming only in ORC file format.