Search code examples
pythonelasticsearchfiltergetpyelasticsearch

How to select or retrieve only specific fields after elasticsearch query?


I am able to query an index in elasticsearch. And, now I want to narrow down the data to some specific fields. But, I am keep getting errors.

Here is my query:

es = Elasticsearch(hosts="myhost", "port":0000)


search_body={
    "bool":{
            "filter":[
                {"exists": {"field": "customer_name"}},
                {"match_phrase": {"city": "chicago"}},
                ]
        }

    }

results = es.search(index="some_index", query=search_body)

I am easily able to get results upto this point. But, since the returned has so many fields, I want to retrieve only specific fields before converting it into a dataframe. I can convert it into a dataframe and then filter, but that is not optimal.


I tried adding _source and field methods as:

search_body={
    "bool":{
            "filter":[
                {"exists": {"field": "customer_name"}},
                {"match_phrase": {"city": "chicago"}},
                ]
        },
    "_source":{"fields": {"includes":["customer_name", "city", "company", "company_address"] }}
    }

and other variants like,

"fields": {"includes":["customer_name", "city", "company", "company_address"] }

# or 

"_source":{"includes":["customer_name", "city", "company", "company_address"] }

# and several others.

I keep getting error:

    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(
elasticsearch.exceptions.RequestError: RequestError(400, 'parsing_exception', '[bool] malformed query, expected [END_OBJECT] but found [FIELD_NAME]')

I followed:

What am I missing here?


Solution

  • The main issue is with passing the "search_body" parameters as body or query.

    If my "search_body" is as given below, I cannot pass it as query because query is meant to be a specific "query" I request on the indexes. Requesting _source on this query malforms the request.

    search_body={
        "bool":{
                "filter":[
                    {"exists": {"field": "customer_name"}},
                    {"match_phrase": {"city": "chicago"}},
                    ]
            },
        "_source":{"fields": {"includes":["customer_name", "city", "company", "company_address"] }}
        }
    

    This will pass because the request is actually passed as body, which contains the "query" and another "_source" field to subset the data.

    es = Elasticsearch(hosts="myhost", "port":0000)
    
    results = es.search(index="some_index", body=search_body)
    

    This will fail because I have requested the search as query and again asking for subsetting the data.

    es = Elasticsearch(hosts="myhost", "port":0000)
    
    results = es.search(index="some_index", query=search_body)
    

    This second request will pass if our search_body is as:

    search_body={
        "bool":{
                "filter":[
                    {"exists": {"field": "customer_name"}},
                    {"match_phrase": {"city": "chicago"}},
                    ]
            }
        }
    

    but for naming convention the key should be named "query_body".

    query_body={
        "bool":{
                "filter":[
                    {"exists": {"field": "customer_name"}},
                    {"match_phrase": {"city": "chicago"}},
                    ]
            }
        }
    

    and requested as:

    es = Elasticsearch(hosts="myhost", "port":0000)
    
    results = es.search(index="some_index", query=query_body)
    

    So, it is to be understood that query and body are two different ways of requesting data on a index.

    Note: Python elasticsearch client may be soon deprecating the body argument in its request. In that case let's see how we can subset the filtered/queried data.

    Hope it helps others.