My goal is to sort millions of logs by timestamp that I receive out of Elasticsearch.
Example logs:
{"realIp": "192.168.0.2", "@timestamp": "2020-12-06T02:00:09.000Z"}
{"realIp": "192.168.0.2", "@timestamp": "2020-12-06T02:01:09.000Z"}
{"realIp": "192.168.0.2", "@timestamp": "2020-12-06T02:02:09.000Z"}
{"realIp": "192.168.0.2", "@timestamp": "2020-12-06T02:04:09.000Z"}
Unfortunately, I am not able to get all the logs sorted out of Elastic. It seems like I have to do it by myself.
Approaches I have tried to get the data sorted out of elastic:
es = Search(index="somelogs-*").using(client).params(preserve_order=True)
for hit in es.scan():
print(hit['@timestamp'])
Another approach:
notifications = (es
.query("range", **{
"@timestamp": {
'gte': 'now-48h',
'lt' : 'now'
}
})
.sort("@timestamp")
.scan()
)
So I am looking for a way to sort these logs by myself or directly through Elasticsearch. Currently, I am saving all the data in a local 'logs.json' and it seems to me I have to iter over and sort it by myself.
Thank you Gino Mempin. It works!
But I also figured out, that a simple change does the same job.
by adding .params(preserve_order=True)
elasticsearch will sort all the data.
es = Search(index="somelog-*").using(client)
notifications = (es
.query("range", **{
"@timestamp": {
'gte': 'now-48h',
'lt' : 'now'
}
})
.sort("@timestamp")
.params(preserve_order=True)
.scan()
)