I am using the following code to loop/scroll over all documents in my elastic search box:
const string indexName = "bla";
var client = GetClient(indexName);
const int scrollTimeout = 1000;
var initialResponse = client.Search<Document>
(scr => scr.Index(indexName)
.From(0)
.Take(100)
.MatchAll()
.Scroll(scrollTimeout))
;
List<XYZ> results;
results = new List<XYZ>();
if (!initialResponse.IsValid || string.IsNullOrEmpty(initialResponse.ScrollId))
throw new Exception(initialResponse.ServerError.Error.Reason);
if (initialResponse.Documents.Any())
results.AddRange(initialResponse.Documents);
var scrollid = initialResponse.ScrollId;
bool isScrollSetHasData = true;
while (isScrollSetHasData)
{
var loopingResponse = client.Scroll<XYZ>(scrollTimeout, scrollid);
if (loopingResponse.IsValid)
{
results.AddRange(loopingResponse.Documents);
scrollid = loopingResponse.ScrollId;
}
isScrollSetHasData = loopingResponse.Documents.Any();
// do some amazing stuff
}
client.ClearScroll(new ClearScrollRequest(scrollid));
For some reason loopingResponse is empty much sooner than expected - i.e. the scroll finishes. Can someone see something fundamentally wrong with my code? Thanks!
Looking at your code I think scrollTimeout
could be the problem. Usually scroll is used for big chunks of data to be returned and 1000ms is not enough to keep the search context alive between requests. You could try to increase it to several minutes to find the best number for your case:
var scrollTimeout = new Time(TimeSpan.FromMinutes(3));
or according to source code you could use Time units (micros, nanos, ms, s, m, h, and d):
var response = client.Search<Document>(scr => scr.Index(indexName)
...
.Scroll("3m")
);