Search code examples
elasticsearchamazon-emrelasticsearch-py

Elsaticsearch 6.3.1 provides different results on cloud and local despite using dfs_query_then_fetch. Query using python's elasticsearch package


I am using Elasticsearch for querying data. I query a medical term and in return i get the code for disease as output. Here is my sample query:

es.search(index="myindex", body={"query": {"match": {"text_field": "search_term"}}}, search_type='dfs_query_then_fetch')
# Expected output - ABC
# Local Output - ABC
# Output on Amazon EMR - XYZ

The problem is when I run it on cloud my output is totally different.

I have exactly the same index on cloud and locally. Despite that the results on cloud are weird. We have an Amazon EMR instance where I have even tried re-creating the index. But no luck.

Local OS - Ubuntu 16.0.4 OS on Amazon EMR -Amazon Linux

Any help would be really appreciated.


Solution

  • For those who responded to my questions, thanks for the efforts.

    I figured out what the problem was.

    There's a bootstrap script running on AWS which starts the elasticsearch service and also runs my index creator python file in parallel.

    Due to this a few requests get timed out during index creation as the cluster takes some time to get up and running. So ideally my index is partially created and therefore the varying results.

    Hope this would be helpful for those running elasticsearch on Amazon EMR.

    Cheers!