I am new to ES and I am using vesion 7.x. I am trying to index a list of POCO class object, but the bulkdescriptor is only overwriting the data and the last object in the List is getting indexed. I am trying to index all the items in "myindex" index.
I have tried to index one by one through for loop but it takes time and i came across this BulkDescriptor Method.
This is the code that I am using for indexing:
ESClient:
public ElasticClient EsClient()
{
ConnectionSettings(connectionPool).DisableDirectStreaming();
Uri EsInstance = new Uri("http://localhost:9200");
ConnectionSettings EsConfiguration = new
ConnectionSettings(EsInstance).DefaultIndex("myindex");
ElasticClient esClient = new ElasticClient(EsConfiguration);
return esClient;
}
Code for Indexing:
var bulkIndexer = new BulkDescriptor();
foreach (var item in items)
{
bulkIndexer.Index<IndexDataItem>(i => i
.Document(item)
.Id(item.Id));
}
var bulkIndexResponse = _connectionToEs.EsClient().Bulk(b => bulkIndexer);
I have also tried to add this var bulkIndexResponse = _connectionToEs.EsClient().Bulk(b => bulkIndexer);
in the foreach loop but it results in the same result.
Here is my POCO Class:
public class IndexDataItem
{
public IndexDataItem()
{
DateModified = DateTime.Now;
DateCreated = DateTime.Now;
}
public int Id { get; set; }
public string Name { get; set; }
public string DisplayName { get; set; }
public string FullText { get; set; }
public DateTime DateCreated { get; set; }
public DateTime DateModified { get; set; }
public DocumentLevel DocumentLevel { get; set; }
public IndexDataField[] Fields { get; set; }
}
I want all the list objects to be indexed under "myindex". Can anybody help on this.
Thanks in Advance!!!
It looks like Rob may have suggested what the problem is in this case in the comments, but I wanted to add some additional info to help get you started.
Using the Bulk()
API allows you to send bulk requests of operations to Elasticsearch. If you need to index a very large amount of documents, you'll need to send multiple bulk requests to do so, handling retries and failures, if they occur. For this, you may consider using the BulkAll()
API in NEST, which is a .NET client abstraction for sending bulk operations from an IEnumerable<T>
of documents
public static IEnumerable<IndexDataItem> GetItems()
{
var count = 0;
while (count < 1_000_000)
{
yield return new IndexDataItem { Id = count + 1 };
count++;
}
}
var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var settings = new ConnectionSettings(pool)
.DefaultIndex("myindex");
var client = new ElasticClient(settings);
var bulkAllObservable = client.BulkAll(GetItems(), b => b
.BackOffTime("30s")
.BackOffRetries(2)
.RefreshOnCompleted()
.MaxDegreeOfParallelism(Environment.ProcessorCount)
.Size(1000)
)
.Wait(TimeSpan.FromMinutes(60), next =>
{
// do something e.g. write number of pages to console
});