Search code examples
mavenhadoopapache-flink

Need to copy flink-hadoop-compatibility-2.10 jar explicitly to ${FLINK-HOME}/lib location on EMR cluster


I am currently working on an Flink application that uses some of the Hadoop dependencies to write the data to S3 location. On local environment it is working fine, however when I deploy this Flink application on EMR cluster it throws an exception related to compatibility issue.

The error message that I am getting is

java.lang.RuntimeException: Could not load the TypeInformation for the class 'org.apache.hadoop.io.Writable'. You may be missing the 'flink-hadoop-compatibility' dependency. at org.apache.flink.api.java.typeutils.TypeExtractor.createHadoopWritableTypeInfo(TypeExtractor.java:2025) at org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1649) at org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1591) at org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfoWithTypeHierarchy(TypeExtractor.java:778) ....

I have included the maven dependency of flink-hadoop-compatibility-2.10 jar in POM dependency. But it is not detecting it. The Flink version I am using is 1.2.0

However, when I explicitly copy the compatibility JAR to the ${FLINK-HOME}/lib location, I am not getting any exception and able to run the Flink application successfully.


Is there any way that we can use, so that without deploying the JAR file to ${FLINK-HOME}/lib we can run the application?

OR

What modifications required in POM dependencies, so that the application will detect it and it is not required to copy the compatibility JAR to flink-home/lib location?


Solution

  • After looking into various posts and experimenting with POM files, I think with current version of Apache Flink (1.2.0) it is required to copy (deploy) the JAR file to ${FLINK-HOME}/lib location.