Search code examples
javaelasticsearchkibanaelastic-stack

org.apache.http.ContentTooLongException: entity content is too long [105539255] for the configured buffer limit [104857600]


Am trying to fetch the indexed PDF documents from my index (ElasticSearch). I have indexed my pdf documents using ingest-attachment processor plugin. Totally its 2500 documents has been indexed along with PDF attachment.

Now am fetching those PDF by searching with the contents of the PDF and am gettig the below error.

org.apache.http.ContentTooLongException: entity content is too long [105539255] for the configured buffer limit [104857600]
    at org.elasticsearch.client.HeapBufferedAsyncResponseConsumer.onEntityEnclosed(HeapBufferedAsyncResponseConsumer.java:76)
    at org.apache.http.nio.protocol.AbstractAsyncResponseConsumer.responseReceived(AbstractAsyncResponseConsumer.java:131)
    at org.apache.http.impl.nio.client.MainClientExec.responseReceived(MainClientExec.java:315)
    at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseReceived(DefaultClientExchangeHandlerImpl.java:147)
    at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.responseReceived(HttpAsyncRequestExecutor.java:303)
    at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:255)
    at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81)
    at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39)
    at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114)
    at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162)
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337)
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276)
    at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
    at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588)
    at java.lang.Thread.run(Thread.java:748)
Exception in thread "main" java.lang.NullPointerException
    at com.es.utility.DocumentSearch.main(DocumentSearch.java:88)

Please find my Java API code to fetch documents from ElasticSearch

private final static String ATTACHMENT = "document_attachment";
private final static String TYPE = "doc";

public static void main(String args[])
{
    RestHighLevelClient restHighLevelClient = null;

    try {
        restHighLevelClient = new RestHighLevelClient(RestClient.builder(new HttpHost("localhost", 9200, "http"),
                new HttpHost("localhost", 9201, "http")));

    } catch (Exception e) {
        System.out.println(e.getMessage());
    }



    SearchRequest contentSearchRequest = new SearchRequest(ATTACHMENT); 
    SearchSourceBuilder contentSearchSourceBuilder = new SearchSourceBuilder();
    contentSearchRequest.types(TYPE);
    QueryBuilder attachmentQB = QueryBuilders.matchQuery("attachment.content", "activa");
    contentSearchSourceBuilder.query(attachmentQB);
    contentSearchSourceBuilder.size(50);
    contentSearchRequest.source(contentSearchSourceBuilder);
    SearchResponse contentSearchResponse = null;
    System.out.println("Request --->"+contentSearchRequest.toString());
    try {
        contentSearchResponse = restHighLevelClient.search(contentSearchRequest);
    } catch (IOException e) {
        e.getLocalizedMessage();
    }

    try {
        System.out.println("Response --->"+restHighLevelClient.search(contentSearchRequest)); // am printing the mentioned error from this line.
    } catch (IOException e) {
        e.printStackTrace();
    }
    SearchHit[] contentSearchHits = contentSearchResponse.getHits().getHits();
    long contenttotalHits=contentSearchResponse.getHits().totalHits;
    System.out.println("condition Total Hits --->"+contenttotalHits);

Am using ElasticSearch version 6.2.3


Solution

  • You need to increase the http.max_content_length in your elasticsearch.yml config file.

    By default, it is set at 100MB (100*1024*1024 = 104857600), so you probably need to set it a little higher than that.

    UPDATE

    It is actually a different issue, which is explained here. Basically, the default HttpAsyncResponseConsumerFactory buffers the whole response body in the heap memory, but only up to 100mb by default. The workaround is to configure another size for that buffer, but your only option is to work with the low-level REST client instead. In ES 7, you'll be able to do this on the High-level REST client using a class called RequestOptions, but it's not released yet.

    long BUFFER_SIZE = 120 * 1024 * 1024;     <---- set buffer to 120MB instead of 100MB
    Map<String, String> params = Collections.emptyMap();
    HttpEntity entity = new NStringEntity(contentSearchSourceBuilder.toString(), ContentType.APPLICATION_JSON);
    HttpAsyncResponseConsumerFactory.HeapBufferedResponseConsumerFactory consumerFactory =
            new HttpAsyncResponseConsumerFactory.HeapBufferedResponseConsumerFactory(BUFFER_SIZE);
    Response response = restClient.performRequest("GET", "/document_attachment/doc/_search", params, entity, consumerFactory);