Search code examples
javaapache-flinkclassloaderflink-sqlpyflink

PyFlink "pipeline.classpaths" vs $FLINK_HOME/lib


What is the difference between class loading classes passed as part of PyFlink pipeline.classpath config and putting them into a $FLINK_HOME\lib directory?

When I want to use flink-sql-connector-kafka-*.jar it works fine just passing it using pipeline.classpath but when I want to use something that has some external dependencies like flink-avro-*.jar that needs avro-*.jar jars. It seems to load flink-avro-*.jar but it looks like it fails to load avro-*.jar and throws:

java.lang.NoClassDefFoundError: Could not initialize class org.apache.avro.SchemaBuilder

When I am dding avro-*.jar to $FLINK_HOME\lib it works just fine.


Solution

  • NoClassDefFoundError and ClassNotFoundException are different

    1. java.lang.ClassNotFoundException This exception indicates that the class was not found on the classpath. This indicates that we were trying to load the class definition, and the class did not exist on the classpath.
    2. java.lang.NoClassDefFoundError This exception indicates that the JVM looked in its internal class definition data structure for the definition of a class and did not find it. This is different than saying that it could not be loaded from the classpath. The point is, a NoClassDefFoundError is not necessarily a classpath problem.

    flink-sql-avro-*.jar is a shaded jar which will relocate the path of org.apache.flink:flink-avro org.apache.avro:avro

    Judging from the NoClassDefFoundError, there may be a conflict between the avro version dependencies