Search code examples
elasticsearchelasticsearch-aggregationelasticsearch-dsl

ElasticSearch - Filtering a result and manipulating the documents


I have the following query - which works fine (this might not be the actual query):

{
  "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "path": "location",
            "query": {
              "geo_distance": {
                "distance": "16090km",
                "distance_type": "arc",
                "location.point": {
                  "lat": "51.794177",
                  "lon": "-0.063055"
                }
              }
            }
          }
        },
        {
          "geo_distance": {
            "distance": "16090km",
            "distance_type": "arc",
            "location.point": {
              "lat": "51.794177",
              "lon": "-0.063055"
            }
          }
        }
      ]
    }
  }
}

Although I want to do the following (as part of the query but not affecting the existing query):

  • Find all documents that have field_name = 1
  • On all documents that have field_name = 1 run ordering by geo_distance
  • Remove duplicates that have field_name = 1 and the same value under field_name_2 = 2 and leave the closest item in the documents result, but remove the rest

Update (further explanation):

Aggregations can't be used as we want to manipulate the documents in the result.

Whilst also maintaining the order within the documents; meaning:

If I have 20 documents, sorted by a field; and I have 5 of which have field_name = 1, I would like to sort the 5 by distance, and eliminate 4 of them; whilst still maintaining the first sort. (possibly doing the geodistance sort and elimination before the actual query?)

Not too sure how to do this, any help is appreciated - I'm currently using ElasticSearch DSL DRF - but I can easily convert the query to ElasticSearch DSL.

Example documents (before manipulation):

[{
"field_name": 1,
"field_name_2": 2,
"location": ....
},
{
"field_name": 1,
"field_name_2": 2,
"location": ....
},
{
"field_name": 55,
"field_name_5": 22,
"location": ....
}]

Output (Desired):

[{
"field_name": 1,
"field_name_2": 2,
"location": .... <- closest
},
{
"field_name": 55,
"field_name_5": 22,
"location": ....
}]

Solution

  • This can be done using Field Collapsing - which is the equivalent of grouping. - Below is an example of how this can be achieved:

    {"collapse": {"field": "vin",
                  "inner_hits": {
                      "name": "closest_dealer",
                      "size": 1,
                      "sort": [
                          {
                              "_geo_distance": {
                                  "location.point": {
                                      "lat": "latitude",
                                      "lon": "longitude"
                                  },
                                  "order": "desc",
                                  "unit": "km",
                                  "distance_type": "arc",
                                  "nested_path": "location"
                              }
                          }
                      ]
                  }
                  }
     }
    

    The collapsing is done on the field vin - and the inner_hits is used to sort the grouped items and get the closest one. (size = 1)