Search code examples
javaelasticsearchapache-flinkflink-streaming

Apache Flink (v1.6.0) authenticate Elasticsearch Sink (v6.4)


I am using Apache Flink v1.6.0 and I am trying to write to Elasticsearch v6.4.0, which is hosted in Elastic Cloud. I am having issue when authenticating to the Elastic Cloud cluster.

I have been able to get Flink to write to a local Elasticsearch v6.4.0 node, which does not have encryption using the following code:

/*
    Elasticsearch Configuration
*/
List<HttpHost> httpHosts = new ArrayList<>();
httpHosts.add(new HttpHost("127.0.0.1", 9200, "http"));

// use a ElasticsearchSink.Builder to create an ElasticsearchSink
ElasticsearchSink.Builder<ObjectNode> esSinkBuilder = new ElasticsearchSink.Builder<>(
        httpHosts,
        new ElasticsearchSinkFunction<ObjectNode>() {
            private IndexRequest createIndexRequest(ObjectNode payload) {

                // remove the value node so the fields are at the base of the json payload
                JsonNode jsonOutput = payload.get("value");

                return Requests.indexRequest()
                        .index("raw-payload")
                        .type("payload")
                        .source(jsonOutput.toString(), XContentType.JSON);
            }

            @Override
            public void process(ObjectNode payload, RuntimeContext ctx, RequestIndexer indexer) {
                indexer.add(createIndexRequest(payload));
            }
        }
);

// set number of events to be seen before writing to Elasticsearch
esSinkBuilder.setBulkFlushMaxActions(1);

// finally, build and add the sink to the job's pipeline
stream.addSink(esSinkBuilder.build());

However when I try and add authentication into the code base, as documented here in the Flink documentation and here on the corresponding Elasticsearch Java documentation. Which looks like this:

// provide a RestClientFactory for custom configuration on the internally created REST client
Header[] defaultHeaders = new Header[]{new BasicHeader("username", "password")};
esSinkBuilder.setRestClientFactory(
        restClientBuilder -> {
            restClientBuilder.setDefaultHeaders(defaultHeaders);
        }
);

I get the following error when executing the job:

14:49:54,700 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Stopped Akka RPC service.
Exception in thread "main" org.apache.flink.runtime.client.JobExecutionException: org.elasticsearch.ElasticsearchStatusException: method [HEAD], host [https://XXXXXXXXXXXXXX.europe-west1.gcp.cloud.es.io:9243], URI [/], status line [HTTP/1.1 401 Unauthorized]
    at org.apache.flink.runtime.minicluster.MiniCluster.executeJobBlocking(MiniCluster.java:623)
    at org.apache.flink.streaming.api.environment.LocalStreamEnvironment.execute(LocalStreamEnvironment.java:123)
    at com.downuk.AverageStockSalePrice.main(AverageStockSalePrice.java:146)
Caused by: org.elasticsearch.ElasticsearchStatusException: method [HEAD], host [https://XXXXXXXXXXXXXX.europe-west1.gcp.cloud.es.io:9243], URI [/], status line [HTTP/1.1 401 Unauthorized]
    at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:625)

Can anyone help point out where I am going wrong?


Solution

  • I was able to work it out after looking at the Flink example here and the Elasticsearch documentation here.

    It turned out that I was trying to set the wrong configuration above:

    restClientBuilder.setDefaultHeaders(...);
    

    Is not what actually needed setting it is:

    restClientBuilder.setHttpClientConfigCallback(...);
    

    Once you use the correct custom configuration the rest is pretty simple. So that part I was missing was:

    // provide a RestClientFactory for custom configuration on the internally created REST client
    esSinkBuilder.setRestClientFactory(
        restClientBuilder -> {
            restClientBuilder.setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() {
                @Override
                public HttpAsyncClientBuilder customizeHttpClient(HttpAsyncClientBuilder httpClientBuilder) {
    
                    // elasticsearch username and password
                    CredentialsProvider credentialsProvider = new BasicCredentialsProvider();
                    credentialsProvider.setCredentials(AuthScope.ANY, new UsernamePasswordCredentials("$USERNAME", "$PASSWORD"));
    
                    return httpClientBuilder.setDefaultCredentialsProvider(credentialsProvider);
                }
            });
        }
    );
    

    And to finish off here is a full snippet for Elasticsearch Sink:

    /*
        Elasticsearch Configuration
    */
    List<HttpHost> httpHosts = new ArrayList<>();
    httpHosts.add(new HttpHost("127.0.0.1", 9200, "http"));
    
    // use a ElasticsearchSink.Builder to create an ElasticsearchSink
    ElasticsearchSink.Builder<ObjectNode> esSinkBuilder = new ElasticsearchSink.Builder<>(
            httpHosts,
            new ElasticsearchSinkFunction<ObjectNode>() {
                private IndexRequest createIndexRequest(ObjectNode payload) {
    
                    // remove the value node so the fields are at the base of the json payload
                    JsonNode jsonOutput = payload.get("value");
    
                    return Requests.indexRequest()
                            .index("raw-payload")
                            .type("payload")
                            .source(jsonOutput.toString(), XContentType.JSON);
                }
    
                @Override
                public void process(ObjectNode payload, RuntimeContext ctx, RequestIndexer indexer) {
                    indexer.add(createIndexRequest(payload));
                }
            }
    );
    
    // set number of events to be seen before writing to Elasticsearch
    esSinkBuilder.setBulkFlushMaxActions(1);
    
    // provide a RestClientFactory for custom configuration on the internally created REST client
    esSinkBuilder.setRestClientFactory(
        restClientBuilder -> {
            restClientBuilder.setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() {
                @Override
                public HttpAsyncClientBuilder customizeHttpClient(HttpAsyncClientBuilder httpClientBuilder) {
    
                    // elasticsearch username and password
                    CredentialsProvider credentialsProvider = new BasicCredentialsProvider();
                    credentialsProvider.setCredentials(AuthScope.ANY, new UsernamePasswordCredentials("$USERNAME", "$PASSWORD"));
    
                    return httpClientBuilder.setDefaultCredentialsProvider(credentialsProvider);
                }
            });
        }
    );
    
    // finally, build and add the sink to the job's pipeline
    stream.addSink(esSinkBuilder.build());
    

    I hope this helps anyone else who was stuck in the same place!