I am trying to set the backend state to hdfs
val stateUri = "hdfs/path_to_dir"
val backend: RocksDBStateBackend = new RocksDBStateBackend(stateUri, true)
env.setStateBackend(backend)
I am running with flink 1.7.0 with the following dependencies (I tried all combinations) :
"org.apache.flink" %% "flink-connector-filesystem" % flinkV
"org.apache.flink" % "flink-hadoop-fs" % flinkV
"org.apache.hadoop" % "hadoop-hdfs" % hadoopVersion
"org.apache.hadoop" % "hadoop-common" % hadoopVersion
however when running the jar I am getting this error:
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 'hdfs'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded.
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:403)
at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:318)
at org.apache.flink.core.fs.Path.getFileSystem(Path.java:298)
at org.apache.flink.runtime.state.filesystem.FsCheckpointStorage.<init>(FsCheckpointStorage.java:58)
at org.apache.flink.runtime.state.filesystem.FsStateBackend.createCheckpointStorage(FsStateBackend.java:444)
at org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createCheckpointStorage(RocksDBStateBackend.java:407)
at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.<init>(CheckpointCoordinator.java:249)
... 17 more
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Hadoop is not in the classpath/dependencies.
at org.apache.flink.core.fs.UnsupportedSchemeFactory.create(UnsupportedSchemeFactory.java:64)
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:399)
... 23 more
any help will be greatly appreciated
For accessing an hdfs://
path it is not strictly necessary to bundle the flink-hadoop-fs
with your job as long as you have flink-shaded-hadoop2-uber-1.8-SNAPSHOT.jar
in the lib
folder of your Flink installation.
If you don't have this dependency in your lib
folder, then I would suggest to use flink-fs-hadoop-shaded
as a dependency because it also relocates Hadoop dependencies.
Moreover, it is important that this dependency is also included in your resulting job jar. Thus, please make sure that you create an uber-jar with sbt-assembly
plugin.