Search code examples
elasticsearchelasticsearch-5elasticsearch-aggregation

Distinct records with geo_distance sort on aggregation ES


I'm working on nearby API using elasticsearch.

I'm trying to run 4 actions in ES query

  • match condition (here running a script to get records within radius)
  • get distinct records based on company's Key (want to get one record from a company)
  • sort records based on geo_distance
  • add the field as Distance to get the distance between user and location

Here is my code:

const query = {
  query: {
      bool: {
        must: [
          customQuery,
          {
            term: {
              "schedule.isShopOpen": true,
            },
          },
          {
            term: {
              isBranchAvailable: true,
            },
          },
          {
            term: {
              branchStatus: "active",
            },
          },
          {
            match:{
              shopStatus: "active"
            }
          },
          {
            script: {
              script: {
                params: {
                  lat: parseFloat(req.lat),
                  lon: parseFloat(req.lon),
                },
                source:
                  "doc['location'].arcDistance(params.lat, params.lon) / 1000 <= doc['searchRadius'].value",
                lang: "painless",
              },
            },
          },
        ],
      },
  },
  aggs: {
    duplicateCount: {
      terms: {
        field: "companyKey",
        size: 10000,
      },
      aggs: {
        duplicateDocuments: {
          top_hits: {
            sort: [
              {
                _geo_distance: {
                  location: {
                    lat: parseFloat(req.lat),
                    lon: parseFloat(req.lon),
                  },
                  order: "asc",
                  unit: "km",
                  mode: "min",
                  distance_type: "arc",
                  ignore_unmapped: true,
                },
              },
            ],
            script_fields: {
              distance: {
                script: {
                  params: {
                    lat: parseFloat(req.lat),
                    lon: parseFloat(req.lon),
                  },
                  inline: `doc['location'].arcDistance(params.lat, params.lon)/1000`,
                },
              },
            },

            stored_fields: ["_source"],
            size: 1,
          },
        },
      },
    },
  },
};

Here's the out put:

data: [
  {
    companyKey: "1234",
    companyName: "Floward",
    branchKey: "3425234",
    branch: "Mursilat",
    distance: 1.810064121687324,
  },
  {
    companyKey: "0978",
    companyName: "Dkhoon",
    branchKey: "352345",
    branch: "Wahah blue branch ",
    distance: 0.08931851500047634,
  },
  {
    companyKey: "567675",
    companyName: "Abdulaziz test",
    branchKey: "53425",
    branch: "Jj",
    distance: 0.011447273197846672,
  },
  {
    companyKey: "56756",
    companyName: "Mouj",
    branchKey: "345345",
    branch: "King fahad",
    distance: 5.822936713752124,
  },
];

I have two issues

  • How to sort records based on geo_distance
  • will query actions(match, script) apply to aggregation data...?

Can you please help me out to solve these issues


Solution

  • This would be more appropriate query for your use case

    {
      "query": {
        "bool": {
          "filter": [
            {
              "geo_distance": {
                "distance": "200km",
                "distance_type": "arc",
                "location": {
                  "lat": 40,
                  "lon": -70
                }
              }
            },
            {
              "match": {
                "shopStatus": "active"
              }
            }
          ]
        }
      },
      "collapse": {
        "field": "companyKey"
      },
      "sort": [
        {
          "_geo_distance": {
            "location": {
              "lat": 40,
              "lon": 71
            },
            "order": "asc",
            "unit": "km",
            "mode": "min",
            "distance_type": "arc",
            "ignore_unmapped": true
          }
        }
      ],
      "_source": ["*"], 
      "script_fields": {
        "distance_in_m": {
          "script": "doc['location'].arcDistance(40, -70)" // convert to unit required
        }
      }
    }
    
    1. Filter instead of must - since you are just filtering documents, filter will be faster as it does not score documents unlike must

    2. collapse

    You can use the collapse parameter to collapse search results based on field values. The collapsing is done by selecting only the top sorted document per collapse key.

    1. Geo distance instead of script -- to find documents with in distance

    2. script field to get distance