In ElasticSearch 2.3 (and in the latest releases) there is a index.max_result_window setting which restricts the search query to a from
+ size
value that is less than 10,000 entries. e.g.
from: 0 size: 10,000 is ok
from: 0 size: 10,001 is not ok
from: 9,000 size: 1,001 is not ok
In the latest release, 7.10, the documentation says this can be worked around by using search-after. However, due to legacy data, I need something similar in ES 2.3. I'm curious if there are any good options?
Why do I need this? In our data we've a child / parent hierarchy. One query we run against this data is to determine all the unique parents over a certain date range. Currently we retrieve this information using an aggregate
query. i.e.
{
"query": { "match_all_in_date_range": {} },
"aggs": {
"parents": {
"terms": {
"field": "parentId"
}
}
}
}
Which, interestingly, returns all the parents even if there are more than 10,000. i.e. It does not appear to be affected by the index.max_result_window
limit.
But this aggregation is expensive and time consuming. As a result I'm evaluating if it's possible to remove it and "aggregate" the data in our own code. i.e. Retrieve all the objects, read their parentId
field, and record the unique ids.
But it looks like the index.max_result_window
limit may break that idea. i.e. Unless I'm mistaken. Two ideas I had to work around this would be
parentIds
I've already retrieved (the downside being that it could take longer to run and will cause the query to grow until the end)But I'd be curious to hear if there are other options available to me?
You could divide the search into smaller ones, separating by hour for example, or by other field, so that each search returns less than 10,000 results