I am importing data with sqoop from MySQL to hdfs as parquet file where it is used by Impala. There is a problem with conversion of MySQL DATE type to Impala TIMESTAMP.
Impala error message when executing compute stats table
or select *
is:
File 'hdfs://....parquet'
has an incompatible type with the table schema for column 'day'.
Expected type: INT32. Actual type: INT64
Changing data type of the day column to BIGINT or STRING leaves the error msg same.
Even when I change day column type in Impala to STRING and set --map-column-java "day=String" (also tried day=Integer and Long) in sqoop I get:
Expected type: INT32. Actual type: BYTE_ARRAY in Impala
I also tried setting mapDateToTimestamp=false (and true) in jdbc connection string but with no effect
What is the way to convert MySQL DATE to Impala TIMESTAMP with sqoop when using parquet file(note that I am not using AVRO)?
(sqoop version I use is 1.4.5-cdh5.3.3) Sqoop command looks like this:
sqoop import
--connect jdbc:mysql://adress/db
--username name
--password pass
--table tableName
--target-dir dir
--as-parquetfile -m 1
--driver com.mysql.jdbc.Driver
edit: I tried converting SQL Date with
--query "SELECT UNIX_TIMESTAMP(STR_TO_DATE(day, '%Y-%m-%d'))
but Impala sees it as INT64 and expects INT96. So another question is how to convert it to INT96?
Looks like the only way is to use --map-column-java "day=String"
and either have that column as STRING in Impala and use cast()
function for queries with day or use temporary table.