Search code examples
hadoopavrosqoop

Sqoop Import failing while imporing AVRO data from SQL Server to HDFS


I am new to AVRO and I am trying to import AVRO format data from SQL Server to HDFS.

Error: org.kitesdk.data.DatasetOperationException: Failed to append {"id": "D22C2475", "create_date": "2020-08-22 14:34:06.0", "modified_date": "2020-08-22 14:34:06.0"} to ParquetAppender{path=job_1597813536070/mr/attempt_1597813536070_m_000000_0/.d55262cf-e49b-4378-addc-0f85698efb47.parquet.tmp">hdfs://nameservice1/tmp/schema/.temp/job_1597813536070/mr/attempt_1597813536070_m_000000_0/.d55262cf-e49b-4378-addc-0f85698efb47.parquet.tmp, schema={"type":"record","name":"AutoGeneratedSchema","doc":"Sqoop import of QueryResult","fields":[{"name":"id","type":["null","string"],"default":null,"columnName":"id","sqlType":"1"},{"name":"create_date","type":["null","long"],"default":null,"columnName":"create_date","sqlType":"93"},{"name":"modified_date","type":["null","long"],"default":null,"columnName":"modified_date","sqlType":"93"}],"tableName":"QueryResult"}, fileSystem=DFS[DFSClient[clientName=DFSClient_attempt_1597813536070_m_000000_0_960843231_1, ugi=username (auth:SIMPLE)]], avroParquetWriter=parquet.avro.AvroParquetWriter@7b122839} Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Number

TABLE - CREATE TABLE “ticket”( id string, create_date string, modified_date string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' TBLPROPERTIES ( 'COLUMN_STATS_ACCURATE'='true', 'avro.schema.url'='hdfs://nameservice1/user/hive/warehouse/schema.db/ticket/.metadata/schemas/1.avsc', 'kite.compression.type'='snappy');

AVRO file metadata - hdfs://nameservice1/user/hive/warehouse/schema.db/ticket/.metadata/schemas/1.avsc' { "type" : "record", "name" : "AutoGeneratedSchema", "doc" : "Sqoop import of QueryResult", "fields" : [ { "name" : "id", "type" : [ "null", "string" ], "default" : null, "columnName" : "id", "sqlType" : "1" }, { "name" : "create_date", "type" : [ "null", "string" ], "default" : null, "columnName" : "create_date", "sqlType" : "93" }, { "name" : "modified_date", "type" : [ "null", "string" ], "default" : null, "columnName" : "modified_date", "sqlType" : "93" }], "tableName" : "QueryResult" }


Solution

  • I fixed the issue. There was some issue with my AVRO metadata file. I recreated it and add it in Hive table with below command.

    alter table table_name set serdeproperties ('avro.schema.url' = 'hdfs://user/hive/warehouse/schema.db/table_name/1.avsc');