I have a problem trying to sink a file into Azure Datalake Gen 2 with the StreamingFileSink from Flink, I'm using core-site.xml with Hadoop Bulk Format I'm trying to copy to my datalake with abfss:// format (also try with abfs://)
java.lang.UnsupportedOperationException: Recoverable writers on Hadoop are only supported for HDFS
[job-playground-job-cluster-0 flink-job-cluster] at org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriter.<init>(HadoopRecoverableWriter.java:61) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
[job-playground-job-cluster-0 flink-job-cluster] at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.createRecoverableWriter(HadoopFileSystem.java:202) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
[job-playground-job-cluster-0 flink-job-cluster] at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.createRecoverableWriter(SafetyNetWrapperFileSystem.java:69) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
[job-playground-job-cluster-0 flink-job-cluster] at org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink$BulkFormatBuilder.createBuckets(StreamingFileSink.java:371) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
I read in the official documentation and dive into Library and the problems is here: https://github.com/apache/flink/blob/master/flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/fs/hdfs/HadoopRecoverableWriter.java#L60
public HadoopRecoverableWriter(org.apache.hadoop.fs.FileSystem fs) {
this.fs = checkNotNull(fs);
// This writer is only supported on a subset of file systems
if (!"hdfs".equalsIgnoreCase(fs.getScheme())) {
throw new UnsupportedOperationException(
"Recoverable writers on Hadoop are only supported for HDFS");
}
This is my core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.azure.account.auth.type.ADLS_ACCOUNT_NAME.dfs.core.windows.net</name>
<value>SharedKey</value>
<description>
It is inferred by the url
</description>
</property>
<property>
<name>fs.azure.account.key.ADLS_ACCOUNT_NAME.dfs.core.windows.net</name>
<value>ADLS_KEY</value>
<description>
</description>
</property>
<property>
<name>fs.azure.createRemoteFileSystemDuringInitialization</name>
<value>true</value>
</property>
<property>
<name>fs.azure.always.use.https</name>
<value>true</value>
</property>
</configuration>
Anyone have pass this problem or is a problem with the extention abfss/abfs.
The StreamingFileSink does not yet support Azure Data Lake.