Search code examples
javamavennoclassdeffounderroravroapache-flink

Avro looking for SnappyCodec is throwing NoClassDefFoundError


I have looked all over the web and found this is a pretty common error, but no solution has helped me.

I am reading from a kafka topic. Up until now I haven't had an issue doing so, but now I am getting this error when I run on the flink cluster up in the aws environment, but not in my IDE (intellij):

NoClassDefFoundError: org/xerial/snappy/Snappy
at org.apache.avro.file.SnappyCodec.decompress(SnappyCodec.java:58)
at org.apache.avro.file.DataFileStream$DataBlock.decompressUsing(DataFileStream.java:352)
at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:199)
at flink.streaming.mtsas.functions.AvroDeserializationSchema.deserialize(AvroDeserializationSchema.java:37)
at org.apache.flink.streaming.util.serialization.KeyedDeserializationSchemaWrapper.deserialize(KeyedDeserializationSchemaWrapper.java:39)
at org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.runFetchLoop(Kafka09Fetcher.java:145)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:255)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:55)
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:95)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:262)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
at java.lang.Thread.run(Thread.java:745)

From what I can find online it sounds like the usual reason has to do with different version being compiled than what is expected, to put it simply. But I am just at a loss. Here is the pom.xml as well:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<parent>
    <groupId>com.group.version1</groupId>
    <artifactId>parent</artifactId>
    <version>1.0-SNAPSHOT</version>
</parent>

<artifactId>this.artifact</artifactId>

<properties>
    <flink.version>1.3.0</flink.version>
    <avro.version>1.8.1</avro.version>
    <maven.compiler.source>1.8</maven.compiler.source>
    <maven.compiler.target>1.8</maven.compiler.target>
</properties>

<dependencies>

    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-java</artifactId>
        <version>${flink.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-connector-nifi_2.11</artifactId>
        <version>${flink.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-streaming-java_2.11</artifactId>
        <version>${flink.version}</version>
    </dependency>
    <dependency>
        <groupId>com.fasterxml.jackson.core</groupId>
        <artifactId>jackson-core</artifactId>
        <version>2.8.8</version>
    </dependency>

    <dependency>
        <groupId>com.google.code.gson</groupId>
        <artifactId>gson</artifactId>
        <version>2.8.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.logging.log4j</groupId>
        <artifactId>log4j-core</artifactId>
        <version>2.8.1</version>
    </dependency>
    <dependency>
        <groupId>org.codehaus.groovy</groupId>
        <artifactId>groovy-all</artifactId>
        <version>2.4.5</version>
    </dependency>
    <dependency>
        <groupId>com.google.guava</groupId>
        <artifactId>guava</artifactId>
        <version>21.0</version>
    </dependency>
    <dependency>
        <groupId>com.googlecode.json-simple</groupId>
        <artifactId>json-simple</artifactId>
        <version>1.1.1</version>
    </dependency>

    <dependency>
        <groupId>org.apache.avro</groupId>
        <artifactId>avro</artifactId>
        <version>${avro.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.avro</groupId>
        <artifactId>avro-maven-plugin</artifactId>
        <version>${avro.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-connector-kafka-0.10_2.11</artifactId>
        <version>${flink.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-connector-cassandra_2.11</artifactId>
        <version>${flink.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-connector-filesystem_2.10</artifactId>
        <version>${flink.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-avro</artifactId>
        <version>0.10.2</version>
    </dependency>

    <dependency>
        <groupId>com.datastax.cassandra</groupId>
        <artifactId>cassandra-driver-core</artifactId>
        <version>3.2.0</version>
    </dependency>

    <dependency>
        <groupId>com.blah</groupId>
        <artifactId>custom-resources</artifactId>
        <version>1.0</version>
    </dependency>
    <dependency>
        <groupId>com.blah</groupId>
        <artifactId>custom-executor</artifactId>
        <version>1.0</version>
    </dependency>

    <dependency>
        <groupId>net.sf.dozer</groupId>
        <artifactId>dozer</artifactId>
        <version>5.4.0</version>
    </dependency>

    <dependency>
        <groupId>com.group.version1</groupId>
        <artifactId>test-utils</artifactId>
        <scope>test</scope>
        <exclusions>
            <exclusion>
                <groupId>org.slf4j</groupId>
                <artifactId>slf4j-simple</artifactId>
            </exclusion>
        </exclusions>
    </dependency>

   <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-statebackend-rocksdb_2.11</artifactId>
        <version>1.2.1</version>
    </dependency>

    <dependency>
        <groupId>com.datastax.cassandra</groupId>
        <artifactId>cassandra-driver-dse</artifactId>
        <version>3.0.0-rc1</version>
    </dependency>
    <dependency>
        <groupId>com.datastax.cassandra</groupId>
        <artifactId>dse-driver</artifactId>
        <version>1.1.2</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
        <version>2.7.2</version>
    </dependency>
</dependencies>

<build>
    <plugins>
        <plugin>
            <groupId>org.apache.avro</groupId>
            <artifactId>avro-maven-plugin</artifactId>
            <version>${avro.version}</version>
            <executions>
                <execution>
                    <phase>generate-sources</phase>
                    <goals>
                        <goal>schema</goal>
                    </goals>
                    <configuration>
                        <sourceDirectory>${project.basedir}/src/main/customavro/</sourceDirectory>
                        <outputDirectory>${project.basedir}/src/main/java/</outputDirectory>
                    </configuration>
                </execution>
            </executions>
        </plugin>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-shade-plugin</artifactId>
            <executions>
                <execution>
                    <phase>package</phase>
                    <goals>
                        <goal>shade</goal>
                    </goals>
                    <configuration>
                        <transformers>
                            <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                <manifestEntries>
                                    <Main-Class>flink.streaming.custom.CustomProcessor</Main-Class>
                                </manifestEntries>
                            </transformer>
                        </transformers>
                        <filters>
                            <filter>
                                <artifact>*:*</artifact>
                                <excludes>
                                    <exclude>META-INF/*.SF</exclude>
                                    <exclude>META-INF/*.DSA</exclude>
                                    <exclude>META-INF/*.RSA</exclude>
                                </excludes>
                            </filter>
                        </filters>
                    </configuration>
                </execution>
            </executions>
        </plugin>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-install-plugin</artifactId>
            <version>2.5</version>
            <executions>
                <execution>
                    <id>inst_1</id>
                    <phase>clean</phase>
                    <goals>
                        <goal>install-file</goal>
                    </goals>
                    <configuration>
                        <groupId>com.blah</groupId>
                        <artifactId>custom-resources</artifactId>
                        <version>1.0</version>
                        <packaging>jar</packaging>
                        <file>${basedir}/lib/custom_resources-1.0.jar</file>
                    </configuration>
                </execution>
                <execution>
                    <id>inst_2</id>
                    <phase>clean</phase>
                    <goals>
                        <goal>install-file</goal>
                    </goals>
                    <configuration>
                        <groupId>com.blah</groupId>
                        <artifactId>custom-executor</artifactId>
                        <version>1.0</version>
                        <packaging>jar</packaging>
                        <file>${basedir}/lib/custom-executor-1.0.jar</file>
                    </configuration>
                </execution>
            </executions>
        </plugin>
    </plugins>
</build>

I am sure it isn't that hard of a thing, but I really have just hit the point where you end up doing more harm than good trying one thing after another.

Thanks for any help in advance!

EDIT:

Also should add that I can find Snappy.java in my IDE at that path (org.xerial.snappy.Snappy).


Solution

  • Upon talking with a contact at data artisans (creators of Flink), it turns out there is a bug with 1.3 that is the reason for this. Here is the link to the bug report https://issues.apache.org/jira/browse/FLINK-6965.