Search code examples
pythonpython-3.xelasticsearchelasticsearch-dslelasticsearch-dsl-py

How to sort paginated logs by @timestamp with Elasticsearch?


My goal is to sort millions of logs by timestamp that I receive out of Elasticsearch.

Example logs:

{"realIp": "192.168.0.2", "@timestamp": "2020-12-06T02:00:09.000Z"}
{"realIp": "192.168.0.2", "@timestamp": "2020-12-06T02:01:09.000Z"}
{"realIp": "192.168.0.2", "@timestamp": "2020-12-06T02:02:09.000Z"}
{"realIp": "192.168.0.2", "@timestamp": "2020-12-06T02:04:09.000Z"}

Unfortunately, I am not able to get all the logs sorted out of Elastic. It seems like I have to do it by myself.

Approaches I have tried to get the data sorted out of elastic:

es = Search(index="somelogs-*").using(client).params(preserve_order=True)
for hit in es.scan():
    print(hit['@timestamp'])

Another approach:

notifications = (es
    .query("range", **{
        "@timestamp": {
            'gte': 'now-48h',
            'lt' : 'now'
        }
    })
    .sort("@timestamp")
    .scan()
)

So I am looking for a way to sort these logs by myself or directly through Elasticsearch. Currently, I am saving all the data in a local 'logs.json' and it seems to me I have to iter over and sort it by myself.


Solution

  • Thank you Gino Mempin. It works!

    But I also figured out, that a simple change does the same job.

    by adding .params(preserve_order=True) elasticsearch will sort all the data.

    es = Search(index="somelog-*").using(client)
    notifications = (es
        .query("range", **{
            "@timestamp": {
                'gte': 'now-48h',
                'lt' : 'now'
            }
        })
        .sort("@timestamp")
        .params(preserve_order=True)
        .scan()
    )