Search code examples
c#postgresqlelasticsearchamazon-redshiftnest

How to bulk insert 3 and more million records into Elastic Search using ASP.NET Core & ElasticClient Nest


I am facing problem while reading data from a Postgres database and inserting into Elastic search using Nest library in an ASP.NET Core console application.

My approach:

  1. Get total rows from RedShift.
  2. Make query of ~100,000 against RedShift.
    • Again divided the 100,000 rows into 10 ten times like 100,000/10,000
    • Make bulk insert into ES of ~10,000 records.
  3. Make next query, offset +100,000.

It's working for 9 or 10 iterations, but sometime failed in ES.

Also this process is taking a long time, I am talking about only Elastic search insert time.

Errors:

The request was canceled due to the configured HttpClient.Timeout of 60 seconds elapsing

TimeoutException: The operation was canceled.

IOException: Unable to read data from the transport connection: The I/O operation has been aborted because of either a thread exit or an application request..

SocketException: The I/O operation has been aborted because of either a thread exit or an application request.

Could anyone help me to resolve this problem and tell me how we can insert 10+ million records into Elastic Search successfully?


Solution

  • I am unsure if the issue might be with anything else in your solution but this is what works for me:

    var chunks = item.Items.Chunk(5000);
    
    foreach (var chunk in chunks)
    {
        await nestClient.BulkAsync(x => x.Index(item.Index).UpdateMany(chunk, (y, z) => y.IdFrom(z, true).Doc(z).Id(z.Id).RetriesOnConflict(2)));
    }
    

    You might want to check the reqiest timeout on you client, mine:

     ConnectionSettings settings = new ConnectionSettings(pool)<other settings>.RequestTimeout(TimeSpan.FromMinutes(3))