I am working with the DSL for Elasticsearch in Python. My goal is to work with Elasticsearch response data in a loop as easily as possible using elasticsearch-dsl-py.
import datetime
import json
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
e_search = Elasticsearch([{'host': 'my-alias', 'port': 5648}])
s = Search(using=e_search, index='sampleindex-2019.10') \
.filter('range' , **{'@timestamp': {'gte': 1571633450000, 'lt': 1571669450000, 'format' : 'epoch_millis'}})
When I execute this I get the following values:
response = s.execute()
print(response.success())
>>> True
print(response.took)
>> 41
print(response.hits.total)
>> 6582
However, when I attempt to loop over all of the results it only seems to print out 10 hits:
for h in response:
print(hit)
<Hit(sampleindex-2019.10/nQGt7G0BGh3E1MmaFw8e): {'startTime': '2019-10-21T13:57:05.621300916+09:00', 'header...'}>
<Hit(sampleindex-2019.10/egCp7G0BGh3E1Mmaq9bC): {'startTime': '2019-10-21T13:53:15.32923433+09:00', 'headers...'}>
<Hit(sampleindex-2019.10/hACo7G0BGh3E1MmaNsXk): {'headers': {'http_version': 'HTTP/1.1', 'http_user_agent': ...}>
<Hit(sampleindex-2019.10/VgCp7G0BGh3E1Mmae9Tv): {'headers': {'http_version': 'HTTP/1.1', 'http_user_agent': ...}>
<Hit(sampleindex-2019.10/nQGt7G0BGh3E1MmaFw8e): {'startTime': '2019-10-21T13:57:05.621300916+09:00', 'header...'}>
<Hit(sampleindex-2019.10/cwGv7G0BGh3E1Mma1Ddj): {'headers': {'http_version': 'HTTP/1.1', 'http_user_agent': ...}>
<Hit(sampleindex-2019.10/PgGv7G0BGh3E1MmaMzCA): {'startTime': '2019-10-21T13:59:11.83491578+09:00', 'headers...'}>
<Hit(sampleindex-2019.10/4wGw7G0BGh3E1MmaSjzb): {'headers': {'http_version': 'HTTP/1.1', 'http_user_agent': ...}>
<Hit(sampleindex-2019.10/cAGs7G0BGh3E1Mma_Q5Z): {'headers': {'http_version': 'HTTP/1.1', 'http_user_agent': ...}>
<Hit(sampleindex-2019.10/6AGw7G0BGh3E1Mma60OW): {'headers': {'http_version': 'HTTP/1.1', 'http_user_agent': ...}>
If I want to work with this output data and do something such as loop over the results and store info in a dictionary, how can I achieve as easily as possible with elasticsearch-dsl-py
?
I found this excerpt in the GitHub docs (also at Read The Docs):
To specify the from/size parameters, use the Python slicing API:
s = s[10:20]
If you want to access all the documents matched by your query you can use the scan method which uses the scan/scroll elasticsearch API:
for hit in s.scan():
print(hit.title)
Note that in this case the results won't be sorted.