I have a parquet table in Hive which has date & timestamp fields. I would now like to read this table from over spark, but it fails with the parquet timestamp compatibility error.
The hive version is 1.2.1 & Spark version is 1.6.1
Exception in thread "main" java.lang.UnsupportedOperationException: Parquet does not support timestamp. See HIVE-6384 App at org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector.getObjectInspector(ArrayWritableObjectInspector.java:98) App at org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector.(ArrayWritableObjectInspector.java:60)
I tried reading from over Hive, it works perfectly fine. However fails when read from over Spark. Here is the query that am trying to run.
import org.apache.spark.sql.hive._
val sqlContext = new HiveContext(sc)
sqlContext.sql("select * from hivetablename limit 10")
The hive table looks like below.
CREATE EXTERNAL TABLE hivetablename (col1 string, date_time timestamp, somedate date) PARTITIONED BY (load_date date)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 's3n://path'
Any suggestions or workarounds ?
Just a quick check, please see which Hive version your Spark is referring to? Make sure it is not referring to older Hive version (<= Hive 0.13).