Search code examples
javamapreducesqoophadoop2avro

Importing data as Avro fails with Sqoop 1.4.7 and Hadoop 2.7.3


I am dealing with issue where I am trying to import vast amount of data from on-premise PostgreSQL slave-replica to Google Cloud Storage in Avro-format using Apache Sqoop.

Importing data with default formats work just fine, but my datapipeline would require importing data into Avro format, however this keeps failing due to reason that has been reported many times in past, as an example:

I have tried to use argument -Dmapreduce.job.user.classpath.first=true as instructed in aforementioned questions, but the error is still:

java.lang.Exception: java.lang.NoSuchMethodError: org.apache.avro.reflect.ReflectData.addLogicalTypeConversion(Lorg/apache/avro/Conversion;)V

This method seems to be added on Avro v.1.8.0, but some dependencies are using older version of Avro where this is not available.

My environment has following versions of these tools:

  • Hadoop 2.7.3.2.6.3.0-235
  • Sqoop 1.4.7
  • javac 1.8.0_191
  • sqoop/lib/parquet-avro-1.6.0.jar
  • sqoop/lib/avro-1.8.1.jar
  • sqoop/lib/avro-mapred-1.8.1-hadoop2.jar

Has anyone still faced this same issue and adding -Dmapreduce.job.user.classpath.first=true to sqoop import doesn't solve the issue?

# Command I'm running
sqoop import -Dmapreduce.job.user.classpath.first=true \
-Dsqoop.export.records.per.statement=1 \
--connect jdbc:postgresql://XX.XX.X.XX/db \
--username postgres \
--password XXXX \
--table FOO \
--target-dir gs://test-bucket/test/ \
--as-avrodatafile \
2>&1 | tee -a /home/userA/logs/test.log

Solution

  • I have encountered the same problem. My configuration is identical except I have Hadoop 2.9.2.

    When I replaced the original

    ${HADOOP_HOME}/share/hadoop/common/lib/avro-1.7.7.jar
    

    with avro-1.8.1.jar that came with Sqoop 1.4.7, the import succeeded.

    I have not yet tested any other Avro operations after I changed the avro jar.