Search code examples
javaazureelasticsearchelasticsearch-java-apiazure-elasticpool

SocketTimeoutException while retrieving or inserting data into Elastic Search by using Rest High Level Client


I'm facing SocketTimeoutException while retrieving/inserting data from/to elastic. This is happening when there are around 10-30 request/second. These requests are combination of get/put.

Here is my elastic configuration:

  • 3 master nodes each of 4GB RAM
  • 2 data nodes each of 8GM RAM
  • Azure load balancer which connects to above data node (seems only 9200 port is opened on it). And java client connects to this load balancer as it's only exposed.
  • Elastic Version: 7.2.0
  • Rest High Level Client:

    <dependency>
        <groupId>org.elasticsearch.client</groupId>
        <artifactId>elasticsearch-rest-high-level-client</artifactId>
        <version>7.2.0</version>
    </dependency>
    
    <dependency>
        <groupId>org.elasticsearch</groupId>
        <artifactId>elasticsearch</artifactId>
        <version>7.2.0</version>
    </dependency>
    

Index Information:

  • Index shards: 2
  • Index replica: 1
  • Index total fields: 10000
  • Size of index from kibana: Total-27.2 MB & Primaries: 12.2MB
  • Index structure:
    {
      "dev-index": {
        "mappings": {
          "properties": {
            "dataObj": {
              "type": "object",
              "enabled": false
            },
            "generatedID": {
              "type": "keyword"
            },
            "transNames": { //it's array of string
              "type": "keyword"
            }
          }
        }
      }
    }
    
  • Dynamic mapping is disabled.

Following is my elastic Config file. Here I've two connection bean, one is for read & another for write to elastic.

ElasticConfig.java:

@Configuration
public class ElasticConfig {

    @Value("${elastic.host}")
    private String elasticHost;

    @Value("${elastic.port}")
    private int elasticPort;

    @Value("${elastic.user}")
    private String elasticUser;

    @Value("${elastic.pass}")
    private String elasticPass;

    @Value("${elastic-timeout:20}")
    private int timeout;

    @Bean(destroyMethod = "close")
    @Qualifier("readClient")
    public RestHighLevelClient readClient(){

        final CredentialsProvider credentialsProvider = new BasicCredentialsProvider();
        credentialsProvider.setCredentials(AuthScope.ANY, new UsernamePasswordCredentials(elasticUser, elasticPass));

        RestClientBuilder builder = RestClient
                .builder(new HttpHost(elasticHost, elasticPort))
                .setHttpClientConfigCallback(httpClientBuilder -> 
                        httpClientBuilder
                                .setDefaultCredentialsProvider(credentialsProvider)
                                .setDefaultIOReactorConfig(IOReactorConfig.custom().setIoThreadCount(5).build())
                );

        builder.setRequestConfigCallback(requestConfigBuilder -> 
                requestConfigBuilder
                        .setConnectTimeout(10000)
                        .setSocketTimeout(60000)
                        .setConnectionRequestTimeout(0)
        );

        RestHighLevelClient restClient = new RestHighLevelClient(builder);
        return restClient;
    }

    @Bean(destroyMethod = "close")
    @Qualifier("writeClient")
    public RestHighLevelClient writeClient(){

        final CredentialsProvider credentialsProvider = new BasicCredentialsProvider();
        credentialsProvider.setCredentials(AuthScope.ANY, new UsernamePasswordCredentials(elasticUser, elasticPass));

        RestClientBuilder builder = RestClient
                .builder(new HttpHost(elasticHost, elasticPort))
                .setHttpClientConfigCallback(httpClientBuilder -> 
                        httpClientBuilder
                                .setDefaultCredentialsProvider(credentialsProvider)
                                .setDefaultIOReactorConfig(IOReactorConfig.custom().setIoThreadCount(5).build())
                );

        builder.setRequestConfigCallback(requestConfigBuilder -> 
                requestConfigBuilder
                        .setConnectTimeout(10000)
                        .setSocketTimeout(60000)
                        .setConnectionRequestTimeout(0)
        );

        RestHighLevelClient restClient = new RestHighLevelClient(builder);
        return restClient;
    }

}

Here is the function which makes a call to elastic, if data is available in elastic it will take it else it will generate data & put into elastic.

public Object getData(Request request) {

    DataObj elasticResult = elasticService.getData(request);
    if(elasticResult!=null){
        return elasticResult;
    }
    else{
        //code to generate data
        DataObj generatedData = getData();//some function which will generated data
        //put above data into elastic by Async call.
        elasticAsync.putData(generatedData);
        return generatedData;
    }
}

ElasticService.java getData Function:

@Service
public class ElasticService {

    @Value("${elastic.index}")
    private String elasticIndex;

    @Autowired
    @Qualifier("readClient")
    private RestHighLevelClient readClient;

    public DataObj getData(Request request){
        String generatedId = request.getGeneratedID();

        GetRequest getRequest = new GetRequest()
                .index(elasticIndex)   //elastic index name
                .id(generatedId);   //retrieving by index id from elastic _id field (as key-value)

        DataObj result = null;
        try {
            GetResponse response = readClient.get(getRequest, RequestOptions.DEFAULT);
            if(response.isExists()) {
                ObjectMapper objectMapper = new ObjectMapper();
                result = objectMapper.readValue(response.getSourceAsString(), DataObj.class);
            }
        }  catch (Exception e) {
            LOGGER.error("Exception occurred during  fetch from elastic !!!! " + ,e);
        }
        return result;
    }

}

ElasticAsync.java Async Put Data Function:

@Service
public class ElasticAsync {

    private static final Logger LOGGER = Logger.getLogger(ElasticAsync.class.getName());

    @Value("${elastic.index}")
    private String elasticIndex;

    @Autowired
    @Qualifier("writeClient")
    private RestHighLevelClient writeClient;

    @Async
    public void putData(DataObj generatedData){
     ElasticVO updatedRequest = toElasticVO(generatedData);//ElasticVO matches to the structure of index given above.

        try {
            ObjectMapper objectMapper = new ObjectMapper();
            String jsonString = objectMapper.writeValueAsString(updatedRequest);

            IndexRequest request = new IndexRequest(elasticIndex);
            request.id(generatedData.getGeneratedID());
            request.source(jsonString, XContentType.JSON);
            request.setRefreshPolicy(WriteRequest.RefreshPolicy.NONE);
            request.timeout(TimeValue.timeValueSeconds(5));
            IndexResponse indexResponse = writeClient.index(request, RequestOptions.DEFAULT);
            LOGGER.info("response id: " + indexResponse.getId());

            }

        } catch (Exception e) {
            LOGGER.error("Exception occurred during saving into elastic !!!!",e);
        }


    }

}

Here is the some part of the stack trace when exception is occurred during saving data into elastic:

2019-07-19 07:32:19.997 ERROR [data-retrieval,341e6ecc5b10f3be,1eeb0722983062b2,true] 1 --- [askExecutor-894] a.c.s.a.service.impl.ElasticAsync        : Exception occurred during saving into elastic !!!!

java.net.SocketTimeoutException: 60,000 milliseconds timeout on connection http-outgoing-34 [ACTIVE]
at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:789) ~[elasticsearch-rest-client-7.2.0.jar!/:7.2.0]
    at org.elasticsearch.client.RestClient.performRequest(RestClient.java:225) ~[elasticsearch-rest-client-7.2.0.jar!/:7.2.0]
    at org.elasticsearch.client.RestClient.performRequest(RestClient.java:212) ~[elasticsearch-rest-client-7.2.0.jar!/:7.2.0]
    at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1448) ~[elasticsearch-rest-high-level-client-7.2.0.jar!/:7.2.0]
    at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1418) ~[elasticsearch-rest-high-level-client-7.2.0.jar!/:7.2.0]
    at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1388) ~[elasticsearch-rest-high-level-client-7.2.0.jar!/:7.2.0]
    at org.elasticsearch.client.RestHighLevelClient.index(RestHighLevelClient.java:836) ~[elasticsearch-rest-high-level-client-7.2.0.jar!/:7.2.0]


Caused by: java.net.SocketTimeoutException: 60,000 milliseconds timeout on connection http-outgoing-34 [ACTIVE]
    at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.timeout(HttpAsyncRequestExecutor.java:387) ~[httpcore-nio-4.4.11.jar!/:4.4.11]
    at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:92) ~[httpasyncclient-4.1.3.jar!/:4.1.3]
    at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:39) ~[httpasyncclient-4.1.3.jar!/:4.1.3]
    at org.apache.http.impl.nio.reactor.AbstractIODispatch.timeout(AbstractIODispatch.java:175) ~[httpcore-nio-4.4.11.jar!/:4.4.11]
    at org.apache.http.impl.nio.reactor.BaseIOReactor.sessionTimedOut(BaseIOReactor.java:263) ~[httpcore-nio-4.4.11.jar!/:4.4.11]
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.timeoutCheck(AbstractIOReactor.java:492) ~[httpcore-nio-4.4.11.jar!/:4.4.11]
    at org.apache.http.impl.nio.reactor.BaseIOReactor.validate(BaseIOReactor.java:213) ~[httpcore-nio-4.4.11.jar!/:4.4.11]
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:280) ~[httpcore-nio-4.4.11.jar!/:4.4.11]
    at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) ~[httpcore-nio-4.4.11.jar!/:4.4.11]
    at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591) ~[httpcore-nio-4.4.11.jar!/:4.4.11]
    ... 1 common frames omitted

Here is the some part of the stack trace when exception is occurred during retrieving data into elastic:

2019-07-19 07:22:37.844 ERROR [data-retrieval,104cf6b2ab5b3349,b302d3d3cd7ebc84,true] 1 --- [o-8080-exec-346] a.c.s.a.service.impl.ElasticService      : Exception occurred during  fetch from elastic !!!! 

java.net.SocketTimeoutException: 60,000 milliseconds timeout on connection http-outgoing-30 [ACTIVE]
    at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:789) ~[elasticsearch-rest-client-7.1.1.jar!/:7.1.1]
    at org.elasticsearch.client.RestClient.performRequest(RestClient.java:225) ~[elasticsearch-rest-client-7.1.1.jar!/:7.1.1]
    at org.elasticsearch.client.RestClient.performRequest(RestClient.java:212) ~[elasticsearch-rest-client-7.1.1.jar!/:7.1.1]
    at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1433) ~[elasticsearch-rest-high-level-client-7.1.1.jar!/:7.1.1]
    at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1403) ~[elasticsearch-rest-high-level-client-7.1.1.jar!/:7.1.1]
    at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1373) ~[elasticsearch-rest-high-level-client-7.1.1.jar!/:7.1.1]
    at org.elasticsearch.client.RestHighLevelClient.get(RestHighLevelClient.java:699) ~[elasticsearch-rest-high-level-client-7.1.1.jar!/:7.1.1]



Caused by: java.net.SocketTimeoutException: 60,000 milliseconds timeout on connection http-outgoing-30 [ACTIVE]
        at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.timeout(HttpAsyncRequestExecutor.java:387) ~[httpcore-nio-4.4.11.jar!/:4.4.11]
    at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:92) ~[httpasyncclient-4.1.3.jar!/:4.1.3]
    at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:39) ~[httpasyncclient-4.1.3.jar!/:4.1.3]
    at org.apache.http.impl.nio.reactor.AbstractIODispatch.timeout(AbstractIODispatch.java:175) ~[httpcore-nio-4.4.11.jar!/:4.4.11]
    at org.apache.http.impl.nio.reactor.BaseIOReactor.sessionTimedOut(BaseIOReactor.java:263) ~[httpcore-nio-4.4.11.jar!/:4.4.11]
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.timeoutCheck(AbstractIOReactor.java:492) ~[httpcore-nio-4.4.11.jar!/:4.4.11]
    at org.apache.http.impl.nio.reactor.BaseIOReactor.validate(BaseIOReactor.java:213) ~[httpcore-nio-4.4.11.jar!/:4.4.11]
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:280) ~[httpcore-nio-4.4.11.jar!/:4.4.11]
    at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) ~[httpcore-nio-4.4.11.jar!/:4.4.11]
    at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591) ~[httpcore-nio-4.4.11.jar!/:4.4.11]
    ... 1 common frames omitted

I've gone through couple of stackoverflow & elastic related blogs where they have mentioned this issue could be due to RAM & cluster configuration of elastic. Then I've changed my shards from 5 to 2 as there were only two data nodes. Also increased ram of Data nodes from 4GB to 8GB, as I get to know that elastic will use only 50% of total RAM. The occurrences of exception have decreased but problem still persist.

What could be possible ways to solve this problem ? What I'm missing from java/elastic configuration point of view which frequently throwing this kind of SocketTimeoutException ? Let me know if you require any more details regarding the configuration.


Solution

  • We've had the same issue and after quite some digging I found the root cause: a config mismatch of the firewall between the client and the elastic servers kernel config for tcp keep alive.

    The firewall drops idle connections after 3600 seconds. The problem was that the kernel parameter for the tcp keep alive was set to 7200 seconds (default in RedHat 6.x/7.x):

    sysctl -n net.ipv4.tcp_keepalive_time
    7200
    

    So the connections are dropped before a keep alive probe is being sent. The asyncHttpClient in the elastic http client doesn't seem to handle dropped connections very well, it just waits until the socket timeout.

    So check whether you have any network device (Loadbalancer, Firewall, Proxy etc.) between your client and server which has a session timeout or similar and either increase that timeout or lower the tcp_keep_alive kernel parameter.