Search code examples
scalahadoopapache-flink

getting an error when configuring the backend state to use hdfs


I am trying to set the backend state to hdfs

val stateUri = "hdfs/path_to_dir"
val backend: RocksDBStateBackend = new RocksDBStateBackend(stateUri, true)
env.setStateBackend(backend)

I am running with flink 1.7.0 with the following dependencies (I tried all combinations) :

   "org.apache.flink"    %% "flink-connector-filesystem"         % flinkV
"org.apache.flink"    % "flink-hadoop-fs"                     % flinkV
"org.apache.hadoop"   % "hadoop-hdfs"                         % hadoopVersion
"org.apache.hadoop"   % "hadoop-common"                       % hadoopVersion

however when running the jar I am getting this error:

Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 'hdfs'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded.
    at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:403)
    at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:318)
    at org.apache.flink.core.fs.Path.getFileSystem(Path.java:298)
    at org.apache.flink.runtime.state.filesystem.FsCheckpointStorage.<init>(FsCheckpointStorage.java:58)
    at org.apache.flink.runtime.state.filesystem.FsStateBackend.createCheckpointStorage(FsStateBackend.java:444)
    at org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createCheckpointStorage(RocksDBStateBackend.java:407)
    at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.<init>(CheckpointCoordinator.java:249)
    ... 17 more
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Hadoop is not in the classpath/dependencies.
    at org.apache.flink.core.fs.UnsupportedSchemeFactory.create(UnsupportedSchemeFactory.java:64)
    at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:399)
    ... 23 more

any help will be greatly appreciated


Solution

  • For accessing an hdfs:// path it is not strictly necessary to bundle the flink-hadoop-fs with your job as long as you have flink-shaded-hadoop2-uber-1.8-SNAPSHOT.jar in the lib folder of your Flink installation.

    If you don't have this dependency in your lib folder, then I would suggest to use flink-fs-hadoop-shaded as a dependency because it also relocates Hadoop dependencies.

    Moreover, it is important that this dependency is also included in your resulting job jar. Thus, please make sure that you create an uber-jar with sbt-assembly plugin.