Search code examples
c#elasticsearchnest

Index, IndexMany, IndexAsnyc, IndexManyAsync with NEST


I try to understand indexing options using nest for ElasticSearch and I executed each of them and here are my results:

    var node = new Uri("http://localhost:9200");
    var settings = new ConnectionSettings(node, defaultIndex: "mydatabase"); 
    settings.SetTimeout(1800000);             
    var client = new ElasticClient(settings);  
    var createIndexResult = client.CreateIndex("mydatabase");  
    var mapResult = client.Map<Product>(c => c.MapFromAttributes().SourceField(s=>s.Enabled(true));

1) Index: When I use Index option by iterating each object, It works smooth although it is slow.

foreach (var item in Items)
{
  elasticClient.Index(item);
}

2) IndexAsync: This worked without any exception but It was not faster than snyc iteration and less documents were indexed.

 foreach (var item in Items)
    {
      elasticClient.IndexAsync(item);
    }

3) IndexMany: I tried, elasticClient.IndexMany(items); without foreach of course, It runs faster than doing foreach -index option, but somehow when I have a lot of data (in my case was 500.000objects) it threw and exception, saying

"System.Net.WebException: The underlying connection was closed: A connection that its continuation was expected, has been closed by the server ..     at System.Net.HttpWebRequest.GetResponse ()"

when I check the log file, I can see only

"2016-01-14

10:21:49,567][WARN ][http.netty ] [Microchip] Caught exception while handling client http traffic, closing connection [id: 0x68398975, /0:0:0:0:0:0:0:1:57860 => /0:0:0:0:0:0:0:1:9200]"

4)IndexManyAsync: elasticClient.IndexManyAsync(Items); trying indexasnyc throws similar exception as snyc but I can see more information in the log file.

[2016-01-14 11:00:16,086][WARN ][http.netty ] [Microchip] Caught exception while handling client http traffic, closing connection [id: 0x43bca172, /0:0:0:0:0:0:0:1:59314 => /0:0:0:0:0:0:0:1:9200] org.elasticsearch.common.netty.handler.codec.frame.TooLongFrameException: HTTP content length exceeded 104857600 bytes.

My questions are what are the exact differences? in which cases we might need async? why both indexmany and indexmanyasnyc options throw such exception? it looks like index option is the safest one. Is it just ok to use it like that?


Solution

  • Using sync or async will not have any impact on Elasticsearch indexing performance. You would want to use async if you do not want to block your client code on completion of indexing, that's all.

    Coming to Index vs IndexMany, it is always recommended to use the latter to take advantage of batching and avoiding too many request/response cycles between your client and Elasticsearch. That said, you cannot simply index such a huge number of documents in a single request. The exception message is pretty clear in saying that your batch index request has exceeded the HTTP content length limit of 100MB. What you need to do is reduce the number of documents you want to index using IndexMany so that you do not hit this limit and then invoke IndexMany multiple times till you complete indexing all of 500,000 documents.