I am trying to import data from oracle to hive table using sqoop incremental job, using parquet file format. But job is failing due to below error
Error: org.kitesdk.data.DatasetOperationException: Failed to append {"CLG_ID": "5",.....19/03/27 00:37:06 INFO mapreduce.Job: Task Id : attempt_15088_130_m_000_2, Status : FAILED
Query to create saved job:
sqoop job -Dhadoop.security.credential.provider.path=jceks://xxxxx --create job1 -- import --connect "jdbc:oracle:thinxxxxxx" --verbose --username user1 --password-alias alisas --query "select CLG_ID,.... from CLG_TBL where \$CONDITIONS" --as-parquetfile --incremental append --check-column CLG_TS --target-dir /hdfs/clg_data/ -m 1
import query :
sqoop job -Dhadoop.security.credential.provider.path=jceks:/xxxxx --exec job1 -- --connect "jdbc:oracle:xxx" --username user1 --password-alias alisas --query "select CLG_ID,.... from CLG_TBL where \$CONDITIONS" --target-dir /hdfs/clg_data/ -m 1 --hive-import --hive-database clg_db --hive-table clg_table --as-parquetfile
This error is a known issue. We have faced with same problem a couple of weeks ago and found this.
Here is the link.
Description of the problem or behavior
In HDP 3, managed Hive tables must be transactional (hive.strict.managed.tables=true). Transactional tables with Parquet format are not supported by Hive. Hive imports with --as-parquetfile must use external tables by specifying --external-table-dir.
Associated error message
Table db.table failed strict managed table checks due to the following reason: Table is marked as a managed table but is not transactional. Workaround
When using --hive-import with --as-parquetfile, users must also provide --external-table-dir with a fully qualified location of the table:
sqoop import ... --hive-import --as-parquetfile --external-table-dir hdfs:///path/to/table