Search code examples
mysqlhadooptimestampsqoopimpala

sqoop date to impala timestamp


I am importing data with sqoop from MySQL to hdfs as parquet file where it is used by Impala. There is a problem with conversion of MySQL DATE type to Impala TIMESTAMP.

Impala error message when executing compute stats table or select * is:

File 'hdfs://....parquet'
has an incompatible type with the table schema for column 'day'.
Expected type: INT32.  Actual type: INT64

Changing data type of the day column to BIGINT or STRING leaves the error msg same.

Even when I change day column type in Impala to STRING and set --map-column-java "day=String" (also tried day=Integer and Long) in sqoop I get:

Expected type: INT32.  Actual type: BYTE_ARRAY in Impala

I also tried setting mapDateToTimestamp=false (and true) in jdbc connection string but with no effect

What is the way to convert MySQL DATE to Impala TIMESTAMP with sqoop when using parquet file(note that I am not using AVRO)?

(sqoop version I use is 1.4.5-cdh5.3.3) Sqoop command looks like this:

sqoop import
--connect jdbc:mysql://adress/db
--username name
--password pass
--table tableName
--target-dir dir
--as-parquetfile -m 1
--driver com.mysql.jdbc.Driver

edit: I tried converting SQL Date with

--query "SELECT UNIX_TIMESTAMP(STR_TO_DATE(day, '%Y-%m-%d'))

but Impala sees it as INT64 and expects INT96. So another question is how to convert it to INT96?


Solution

  • Looks like the only way is to use --map-column-java "day=String" and either have that column as STRING in Impala and use cast() function for queries with day or use temporary table.