Search code examples
apache-flink

Flink, odd behavior when using Hadoop Compatibility


I've add Flink Hadoop Compatibility to the project which reads sequence file from hdfs path,

<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-hadoop-compatibility_2.11</artifactId>
    <version>1.5.6</version>
</dependency>

Here's the java code snippet,

DataSource<Tuple2<NullWritable, BytesWritable>> input = env.createInput(HadoopInputs.readHadoopFile(
    new org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat<NullWritable, BytesWritable>(),
    NullWritable.class, BytesWritable.class, path));

This works pretty fine when I run it inside my Eclipse, but when I submit it via command line 'flink run ...', it complains,

The type returned by the input format could not be automatically determined. Please specify the TypeInformation of the produced type explicitly by using the 'createInput(InputFormat, TypeInformation)' method instead.

OK, so I update my code to add type information,

DataSource<Tuple2<NullWritable, BytesWritable>> input = env.createInput(HadoopInputs.readHadoopFile(
    new org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat<NullWritable, BytesWritable>(),
    NullWritable.class, BytesWritable.class, path),
    TypeInformation.of(new TypeHint<Tuple2<NullWritable, BytesWritable>>() {}));

Now it complains,

Caused by: java.lang.RuntimeException: Could not load the TypeInformation for the class 'org.apache.hadoop.io.Writable'. You may be missing the 'flink-hadoop-compatibility' dependency.

Some people suggest to copy flink-hadoop-compatibility_2.11-1.5.6.jar to FLINK_HOME/lib, but it doesn't help, still same error.

Does anyone have any clue?

My Flink is a standalone installation, version 1.5.6.

UPDATE:

Sorry, I copied flink-hadoop-compatibility_2.11-1.5.6.jar to the wrong place, after fixing that, it works.

Now my question is, is there any other way to go? Because copying that jar file to FLINK_HOME/lib is definitely not a good idea to me, especially when talking about a big flink cluster.


Solution

  • Fixed in version 1.9.0, see https://issues.apache.org/jira/browse/FLINK-12163 for details