Search code examples
scalaapache-sparkignite

Spark unable to perform operations on Ignite RDD: InvalidClassException org.apache.ignite.spark.impl.IgnitePartition; local class incompatible


I am testing out Ignite Spark Integration and running into issues when Spark is to perform operations on Ignite RDD. Spark master and the worker are up and running along with 2 node Ignite cluster. Getting InvalidClassException Please suggest what can be done to overcome this.

Spark version 2.3.0. Ignite version 2.10.0

Spark shell is started with

spark-2.3.0-bin-hadoop2.7]# ./bin/spark-shell --packages org.apache.ignite:ignite-spark:1.8.0,org.apache.ignite:ignite-spring:1.8.0  --conf spark.driver.extraClassPath=/opt/apache-ignite-2.10.0-bin/libs/*:/opt/apache-ignite-2.10.0-bin/libs/optional/ignite-spark/*:/opt/apache-ignite-2.10.0-bin/libs/optional/ignite-log4j/*:/opt/apache-ignite-2.10.0-bin/libs/optional/ignite-yarn/*:/opt/apache-ignite-2.10.0-bin/libs/ignite-spring/* --jars /opt/apache-ignite-2.10.0-bin/libs/ignite-core-2.10.0.jar,/opt/apache-ignite-2.10.0-bin/libs/ignite-spark-2.10.0.jar,/opt/apache-ignite-2.10.0-bin/libs/cache-api-1.0.0.jar,/opt/apache-ignite-2.10.0-bin/libs/ignite-log4j-2.10.0.jar,/opt/apache-ignite-2.10.0-bin/libs/log4j-1.2.17.jar,/opt/apache-ignite-2.10.0-bin/libs/ignite-spring/ignite-spring-2.10.0.jar --master spark://xx.xx.xx.xx:7077

Execption attached below:

scala> val ic = new IgniteContext(sc, "/opt/apache-ignite-2.10.0-bin/examples/config/spark/example-shared-rdd.xml")
2021-08-18 11:19:35 WARN  IgniteKernal:576 - Please set system property '-Djava.net.preferIPv4Stack=true' to avoid possible problems in mixed environments.
2021-08-18 11:19:35 WARN  TcpCommunicationSpi:576 - Message queue limit is set to 0 which may lead to potential OOMEs when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to message queues growth on sender and receiver sides.
2021-08-18 11:19:35 WARN  NoopCheckpointSpi:576 - Checkpoints are disabled (to enable configure any GridCheckpointSpi implementation)
2021-08-18 11:19:35 WARN  GridCollisionManager:576 - Collision resolution is disabled (all jobs will be activated upon arrival).
2021-08-18 11:19:35 WARN  IgniteH2Indexing:576 - Serialization of Java objects in H2 was enabled.
2021-08-18 11:19:37 WARN  GridClusterStateProcessor:576 - Received state finish message with unexpected ID: ChangeGlobalStateFinishMessage [id=36bcec75b71-46c2fd15-4c79-4b79-a1d5-998b63dc7dd6, reqId=bfecc55e-bbe0-4451-95ef-3f8020b7b97e, state=ACTIVE, transitionRes=true]
ic: org.apache.ignite.spark.IgniteContext = org.apache.ignite.spark.IgniteContext@5af7e2f

scala> val sharedRDD = ic.fromCache[Integer, Integer]("partitioned")
sharedRDD: org.apache.ignite.spark.IgniteRDD[Integer,Integer] = IgniteRDD[0] at RDD at IgniteAbstractRDD.scala:32

scala> sharedRDD.filter(_._2 < 10).collect()
[Stage 0:>                                                       (0 + 4) / 1024]

2021-08-18 11:20:08 WARN  TaskSetManager:66 - Lost task 3.0 in stage 0.0 (TID 3, 172.16.8.181, executor 0): java.io.InvalidClassException: org.apache.ignite.spark.impl.IgnitePartition; local class incompatible: stream classdesc serialVersionUID = -2372563944236654956, local class serialVersionUID = -7759805985368763313
            at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699)
            at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2001)
            at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1848)
            at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2158)
            at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1665)
            at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2403)
            at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2327)
            at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2185)
            at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1665)
            at java.io.ObjectInputStream.readObject(ObjectInputStream.java:501)
            at java.io.ObjectInputStream.readObject(ObjectInputStream.java:459)
            at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
            at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
            at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:313)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
    
    2021-08-18 11:20:08 ERROR TaskSetManager:70 - Task 0 in stage 0.0 failed 4 times; aborting job
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 25, 172.16.8.181, executor 0): java.io.InvalidClassException: org.apache.ignite.spark.impl.IgnitePartition; local class incompatible: stream classdesc serialVersionUID = -2372563944236654956, local class serialVersionUID = -7759805985368763313
            at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699)
            at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2001)
            at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1848)
            at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2158)
            at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1665)
            at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2403)
            at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2327)
            at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2185)
            at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1665)
            at java.io.ObjectInputStream.readObject(ObjectInputStream.java:501)
            at java.io.ObjectInputStream.readObject(ObjectInputStream.java:459)
            at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
            at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
            at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:313)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
    
    Driver stacktrace:
      at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1599)
      at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1587)
      at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1586)
      at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
      at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
      at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1586)
      at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
      at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
      at scala.Option.foreach(Option.scala:257)
      at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
      at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1820)
      at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1769)
      at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1758)
      at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
      at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
      at org.apache.spark.SparkContext.runJob(SparkContext.scala:2027)
      at org.apache.spark.SparkContext.runJob(SparkContext.scala:2048)
      at org.apache.spark.SparkContext.runJob(SparkContext.scala:2067)
      at org.apache.spark.SparkContext.runJob(SparkContext.scala:2092)
      at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:939)
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
      at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
      at org.apache.spark.rdd.RDD.collect(RDD.scala:938)
      ... 53 elided
    Caused by: java.io.InvalidClassException: org.apache.ignite.spark.impl.IgnitePartition; local class incompatible: stream classdesc serialVersionUID = -2372563944236654956, local class serialVersionUID = -7759805985368763313
      at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699)
      at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2001)
      at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1848)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2158)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1665)
      at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2403)
      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2327)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2185)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1665)
      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:501)
      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:459)
      at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
      at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:313)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)

example-shared-rdd.xml

<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="
        http://www.springframework.org/schema/beans
        http://www.springframework.org/schema/beans/spring-beans.xsd">

    <bean id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
        <property name="cacheConfiguration">
            <!-- SharedRDD cache example configuration (Atomic mode). -->
            <bean class="org.apache.ignite.configuration.CacheConfiguration">
                <!-- Set a cache name. -->
                <property name="name" value="sharedRDD"/>
                <!-- Set a cache mode. -->
                <property name="cacheMode" value="PARTITIONED"/>
                <!-- Index Integer pairs used in the example. -->
                <property name="indexedTypes">
                    <list>
                        <value>java.lang.Integer</value>
                        <value>java.lang.Integer</value>
                    </list>
                </property>
                <!-- Set atomicity mode. -->
                <property name="atomicityMode" value="ATOMIC"/>
                <!-- Configure a number of backups. -->
                <property name="backups" value="1"/>
            </bean>
        </property>
        <property name="workDirectory" value="/opt/apache-ignite-2.10.0-bin/ignitework"/>
        <property name="consistentId" value="Spark test1"/>
        <!-- Explicitly configure TCP discovery SPI to provide list of initial nodes. -->
        <property name="discoverySpi">
            <bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">

                <property name="ipFinder">
                    <!--
                        Ignite provides several options for automatic discovery that can be used
                        instead os static IP based discovery. For information on all options refer
                        to our documentation: http://apacheignite.readme.io/docs/cluster-config
                    -->
                    <!-- Uncomment static IP finder to enable static-based discovery of initial nodes. -->
                    <!--<bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">-->
                    <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.multicast.TcpDiscoveryMulticastIpFinder">
                        <property name="addresses">
                            <list>
                                <!-- In distributed environment, replace with actual host IP address. -->
                                <value>IP1:47500..47509</value>
                                <value>IP2:47500..47509</value>

                            </list>
                        </property>
                    </bean>
                </property>
            </bean>
        </property>
    </bean>
</beans>

Kindly help with a solution to this issue.

Thanks in advance.


Solution

  • The issue got fixed by adding the correct package version while starting the sparkshell.

    --packages org.apache.ignite:ignite-spark:2.10.0,org.apache.ignite:ignite-spring:2.10.0