Search code examples
amazon-s3apache-flinkflink-streaming

Flink S3 Write Fails Unable to load AWS credentials from any provider in the chains


When I use flinks streaming API to write to S3:

// Set StreamExecutionEnvironment
final StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();

// Set checkpoints in ms
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);

// Add source (input stream)
DataStream<String> dataStream = StreamUtil.getDataStream(env, params);

// Sink to S3 Bucket
dataStream.writeAsText("s3a://test-flink/test.txt").setParallelism(1);

I get the following error:

Unable to load AWS credentials from any provider in the chain

My configuration is:

# flink --version
Version: 1.3.1, Commit ID: 1ca6e5b

The Hadoop config directory was added to flink-conf.yaml

# cat flink/config/flink-conf.yaml | head -n1
fs.hdfs.hadoopconf: /root/hadoop-config

The rest of the content of flink-conf.yaml is identical to the release version.

The following was added to /root/hadoop-config/core-site.xml

# cat  /root/hadoop-config/core-site.xml
<configuration>
<property>
    <name>fs.s3a.impl</name>
    <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
</property>

<property>
    <name>fs.s3a.buffer.dir</name>
    <value>/tmp</value>
</property>

<property>
    <name>fs.s3a.access.key</name>
    <value>MY_ACCESS_KEY</value>
</property>

<property>
    <name>fs.s3a.secret.key</name>
    <value>MY_SECRET_KEY</value>
</property>
</configuration>

The JAR’s aws-java-sdk-1.7.4.jar, hadoop-aws-2.7.4.jar, httpclient-4.2.5.jar, httpcore-4.2.5.jar where added to flink/lib/ from http://apache.mirror.anlx.net/hadoop/common/hadoop-2.7.4/hadoop-2.7.4.tar.gz

# ls flink/lib/
aws-java-sdk-1.7.4.jar
flink-dist_2.11-1.3.1.jar
flink-python_2.11-1.3.1.jar
flink-shaded-hadoop2-uber-1.3.1.jar
hadoop-aws-2.7.4.jar
httpclient-4.2.5.jar
httpcore-4.2.5.jar
log4j-1.2.17.jar
slf4j-log4j12-1.7.7.jar

Note the aws-java-sdk-1.7.4.jar is 1.7.4 and not 1.7.2 as it is in the docs here

pom.xml has the following build dependencies.

    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-connector-filesystem_2.10</artifactId>
        <version>1.3.1</version>
    </dependency>

    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-aws</artifactId>
        <version>2.7.2</version>
    </dependency>

My reference was the (https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/aws.html#set-s3-filesystem)

I am able to write to the S3 bucket using the credentials in core-site.xml with awscli.


Solution

  • I have used DataStream API to write to S3, in my case the core-site.xml is actually present in the jar with the same configurations. Can you please try this approach.

    The error occurs when the S3 API is not able to get the credentials from any provider as described in the below link

    There are other approaches defined here to provide the credentials: http://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html