I am dealing with issue where I am trying to import vast amount of data from on-premise PostgreSQL slave-replica to Google Cloud Storage in Avro-format using Apache Sqoop.
Importing data with default formats work just fine, but my datapipeline would require importing data into Avro format, however this keeps failing due to reason that has been reported many times in past, as an example:
I have tried to use argument -Dmapreduce.job.user.classpath.first=true
as instructed in aforementioned questions, but the error is still:
java.lang.Exception: java.lang.NoSuchMethodError: org.apache.avro.reflect.ReflectData.addLogicalTypeConversion(Lorg/apache/avro/Conversion;)V
This method seems to be added on Avro v.1.8.0, but some dependencies are using older version of Avro where this is not available.
My environment has following versions of these tools:
Has anyone still faced this same issue and adding -Dmapreduce.job.user.classpath.first=true
to sqoop import
doesn't solve the issue?
# Command I'm running
sqoop import -Dmapreduce.job.user.classpath.first=true \
-Dsqoop.export.records.per.statement=1 \
--connect jdbc:postgresql://XX.XX.X.XX/db \
--username postgres \
--password XXXX \
--table FOO \
--target-dir gs://test-bucket/test/ \
--as-avrodatafile \
2>&1 | tee -a /home/userA/logs/test.log
I have encountered the same problem. My configuration is identical except I have Hadoop 2.9.2.
When I replaced the original
${HADOOP_HOME}/share/hadoop/common/lib/avro-1.7.7.jar
with avro-1.8.1.jar
that came with Sqoop 1.4.7, the import succeeded.
I have not yet tested any other Avro operations after I changed the avro jar.