Search code examples
sortingelasticsearchnested

Using NestedPath in Script Sort Elastic Search doesn't allow accessing outer properties


I need to sort based on two logical part in script. For each document, min value ( HQ and offices distance from given distance) is calculated and returned for sorting. Since I need to return only 1 value, I need to combine those scripts that calculate distance between hq and given location as well as multiple offices and given location.

I tried to combine those but Offices is nested property and Headquarter is non-nested property. If I use "NestedPath", somehow I am not able to access Headquarter property. Without "NestedPath", I am not able to use Offices property. here is the mapping:

         "offices" : {
            "type" : "nested",
            "properties" : {
              "coordinates" : {
                "type" : "geo_point",
                "fields" : {
                  "raw" : {
                    "type" : "text",
                    "index" : false
                  }
                },
                "ignore_malformed" : true
              },
              "state" : {
                "type" : "text"
              }
            }
          },
        "headquarters" : {
            "properties" : {
              "coordinates" : {
                "type" : "geo_point",
                "fields" : {
                  "raw" : {
                    "type" : "text",
                    "index" : false
                  }
                },
                "ignore_malformed" : true
              },
              "state" : {
                "type" : "text"
              }
            }
          }

And here is the script that I tried :

 "sort": [
    {
      "_script": {
        "nested" : {
          "path" : "offices"
        },
        "order": "asc",
        "script": {
          "lang": "painless",
          "params": {
            "lat": 28.9672,
            "lon": -98.4786
          },
          "source": "def hqDistance = 1000000;if (!doc['headquarters.coordinates'].empty){hqDistance = doc['headquarters.coordinates'].arcDistance(params.lat, params.lon) * 0.000621371;} def officeDistance= doc['offices.coordinates'].arcDistance(params.lat, params.lon) * 0.000621371; if (hqDistance < officeDistance) { return hqDistance; } return officeDistance;"
        },
        "type": "Number"
      }
    }
  ],

When I run the script, Headquarters logic is not even executed it seems, I get results only based on offices distance.


Solution

  • Nested fields operate in a separate context and their content cannot be accessed from the outer level, nor vice versa.

    You can, however, access a document's raw _source.

    But there's a catch:

    • See, when iterating under the offices nested path, you were able to call .arcDistance because the coordinates are of type ScriptDocValues.GeoPoint.
    • But once you access the raw _source, you'll be dealing with an unoptimized set of java.util.ArrayLists and java.util.HashMaps.

    This means that even though you can iterate an array list:

    ...
    for (def office : params._source['offices']) {
       // office.coordinates is a trivial HashMap of {lat, lon}!
    }
    

    calculating geo distances won't be directly possible…

    …unless you write your own geoDistance function -- which is perfectly fine with Painless, but it'll need to be defined at the top of a script.

    No need to reinvent the wheel though: Calculating distance between two points, using latitude longitude?

    A sample implementation

    Assuming your documents look like this:

    POST my-index/_doc
    {
      "offices": [
        {
          "coordinates": "39.9,-74.92",
          "state": "New Jersey"
        }
      ],
      "headquarters": {
        "coordinates": {
          "lat": 40.7128,
          "lon": -74.006
        },
        "state": "NYC"
      }
    }
    

    your sorting script could look like this:

    GET my-index/_search
    {
       "sort": [
        {
          "_script": {
            "order": "asc",
            "script": {
              "lang": "painless",
              "params": {
                "lat": 28.9672,
                "lon": -98.4786
              },
              "source": """
                // We can declare functions at the beginning of a Painless script
                // https://www.elastic.co/guide/en/elasticsearch/painless/current/painless-functions.html#painless-functions
                
                double deg2rad(double deg) {
                  return (deg * Math.PI / 180.0);
                }
                
                double rad2deg(double rad) {
                  return (rad * 180.0 / Math.PI);
                }
                
                // https://stackoverflow.com/a/3694410/8160318
                double geoDistanceInMiles(def lat1, def lon1, def lat2, def lon2) {
                  double theta = lon1 - lon2;
                  double dist = Math.sin(deg2rad(lat1)) * Math.sin(deg2rad(lat2)) + Math.cos(deg2rad(lat1)) * Math.cos(deg2rad(lat2)) * Math.cos(deg2rad(theta));
                  dist = Math.acos(dist);
                  dist = rad2deg(dist);
                  return dist * 60 * 1.1515;
                }
    
                // start off arbitrarily high            
                def hqDistance = 1000000;
    
                if (!doc['headquarters.coordinates'].empty) {
                  hqDistance = doc['headquarters.coordinates'].arcDistance(params.lat, params.lon) * 0.000621371;
                }
                
                // assume office distance as large as hq distance
                def officeDistance = hqDistance;
                
                // iterate each office and compare it to the currently lowest officeDistance
                for (def office : params._source['offices']) {
                  // the coordinates are formatted as "lat,lon" so let's split...
                  def latLong = Arrays.asList(office.coordinates.splitOnToken(","));
                  // ...and parse them before passing onwards
                  def tmpOfficeDistance = geoDistanceInMiles(Float.parseFloat(latLong[0]),
                                                             Float.parseFloat(latLong[1]),
                                                             params.lat,
                                                             params.lon);
                  // we're interested in the nearest office...
                  if (tmpOfficeDistance < officeDistance) {
                    officeDistance = tmpOfficeDistance;
                  }
                }
                
                if (hqDistance < officeDistance) {
                  return hqDistance;
                }
                
                return officeDistance;
              """
            },
            "type": "Number"
          }
        }
      ]
    }
    

    Shameless plug: I dive deep into Elasticsearch scripting in a dedicated chapter of my ES Handbook.