Search code examples
hadoopapache-sparkavro

Error while I launch spark-submit because avro


I am creating an application in spark. I use avro files in HDFS with Hadoop2. I use maven and I include avro like this :

<dependency>
            <groupId>org.apache.avro</groupId>
            <artifactId>avro-mapred</artifactId>
            <version>1.7.6</version>
            <classifier>hadoop2</classifier>
</dependency>

I did a unit test and while I use mvn test, all work. But While I launch with spark submit no ! and I have this mistake :

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 1 times, most recent failure: Lost task 1.0 in stage 0.0 (TID 1, localhost): java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
    at org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47)

Can you help me ?

Thank you


Solution

  • Ok, I fond the solution :D Thanks to http://apache-spark-developers-list.1001551.n3.nabble.com/Fwd-Unable-to-Read-Write-Avro-RDD-on-cluster-td10893.html.

    The solution is to add jar in your SPARK_CLASSPATH

    export SPARK_CLASSPATH=yourpath/avro-mapred-1.7.7-hadoop2.jar:yourpath/avro-1.7.7.jar
    

    You can download the jar here : http://repo1.maven.org/maven2/org/apache/avro/avro-mapred/1.7.7/