I want to read a large number (>15000) of entries from ElasticSearch via Spring-Data-ElasticSearch.
For this I followed this documentation: https://docs.spring.io/spring-data/elasticsearch/docs/3.2.6.RELEASE/reference/html/#elasticsearch.scroll
I only copied the code snippet provided by the documentation except for the searchQuery:
SearchQuery searchQuery = new NativeSearchQueryBuilder()
.withQuery(queryBuilder)
.withPageable(PageRequest.of(0, 10))
.build();
CloseableIterator<SampleEntity> stream = elasticsearchTemplate.stream(searchQuery, SampleEntity.class);
List<SampleEntity> sampleEntities = new ArrayList<>();
while (stream.hasNext()) {
sampleEntities.add(stream.next());
}
The problem is that the stream always retunrs all entries and not just the ones for the requestet page, which should be 10.
Did I miss something here or is it a bug?
Thanks in advance
when using the Stream API, the page size is used internally to determine the size of data that is retrieved from Elasticsearch, it has no effect on the number of elements in the stream.
So in your example, when you start consuming the stream, the first 10 elements are fetched using the Elasticsearch scroll API. When you request the 11th element, the next chunk of 10 elements is fetched with the internally stored scroll id made available for consuming.
This is repeated until all data available as response to the query is returned.