Search code examples
pythonhttpelasticsearchapache-zeppelinelasticsearch-py

elasticsearch-py search queries substantially slower than curl equivalent


In a Zeppelin notebook, running the following query with elasticsearch-py 5x

es = Elasticsearch(["es-host:9200"])
es.search(index="some_index", 
          doc_type="some_type", 
          body={"query": {"term": {"day": "2018_02_04"}}}
)

Takes 28 minutes to return.

From the same notebook, using curl to run:

curl -XGET 'http://es-host:9200/some_index/some_type/_search?pretty' -H 'Content-Type: application/json' -d'
{"query": {"term": {"day": "2018_02_04"}}}
'

returns basically instantly.

Why is the python library performance so poor, and what can be done to make that fast?


Solution

  • I do not understand why this works, but if I add a filter_path to the query, it returns as quickly as the raw curl:

    es = Elasticsearch(["es-host:9200"])
    results = es.search(index="some_index", 
          doc_type="some_type", 
          filter_path=['hits.hits._id'],
          body={"query": {"term": {"day": "2018_02_04"}}}
    )
    

    If anyone has an explanation for this behavior, I'd appreciate that.