Search code examples
elasticsearchchewy-gem

Order Terms Aggregation by Geo Distance


So I have an issue here...

I'm using chewy ruby gem to communicate with Elasticsearch

=> #<Chewy::SnippetPagesIndex::Query:0x007f911c6b1610
 @_collection=nil,
 @_fully_qualified_named_aggs={"chewy::snippetpagesindex"=>{"chewy::snippetpagesindex::snippetpage"=>{}}},
 @_indexes=[Chewy::SnippetPagesIndex],
 @_named_aggs={},
 @_request=nil,
 @_response=nil,
 @_results=nil,
 @_types=[],
 @criteria=
  #<Chewy::Query::Criteria:0x007f911c6b1458
   @aggregations=
    {:group_by=>{:terms=>{:field=>"seo_area.suburb.id", :order=>{:_count=>"asc"}}, :aggs=>{:by_top_hit=>{:top_hits=>{:size=>10}}}}},
   @facets={},
   @fields=[],
   @filters=
    [{:geo_distance=>{:distance=>"100km", "seo_area.suburb.coordinates"=>"-27.9836052, 153.3977354"}},
     {:bool=>
       {:must_not=>[{:terms=>{:id=>[1]}}, {:terms=>{"seo_area.suburb.id"=>[5559]}}],
        :must=>[{:term=>{:path_category=>"garden-services"}}, {:term=>{:status=>"active"}}, {:exists=>{:field=>"path_area"}}],
        :should=>[]}}],
   @options=
    {:query_mode=>:must,
     :filter_mode=>:and,
     :post_filter_mode=>:and,
     :preload=>
      {:scope=>
        #<Proc:0x007f911c6b1700@/Users/serviceseeking/Work/serviceseeking/engines/seo/app/concepts/seo/snippet_page/twins/search.rb:45 (lambda)>},
     :loaded_objects=>true},
   @post_filters=[],
   @queries=[],
   @request_options={},
   @scores=[],
   @script_fields={},
   @search_options={},
   @sort=[{:_geo_distance=>{"seo_area.suburb.coordinates"=>"-27.9836052, 153.3977354", :order=>"asc", :unit=>"km"}}],
   @suggest={},
   @types=[]>,
 @options={}>

I'm using Elasticsearch aggregation so any sorting from the query/search phase will be gone upon accessing the aggregation.

What I've been passing is this...

     aggs: {
        by_seo_area_suburb_id: {
          terms: {
            field: "seo_area.suburb.id",
            size: 10,
            order: { by_distance: "desc" }
          },
          aggs: {
            by_top_hit: {
              top_hits: { size: 10 }
            },
            by_distance: {
              geo_distance: {
                field: "seo_area.suburb.coordinates",
                origin: "52.3760, 4.894",
                ranges: [
                  { from: 0, to: 1 },
                  { from: 1, to: 2 }
                ]
              }
            }
          }
        }
      }

I'm getting this error though...

[500] {"error":{"root_cause":[{"type":"aggregation_execution_exception","reason":"Invalid terms aggregation order path [by_distance]. Terms buckets can only be sorted on a sub-aggregator path that is built out of zero or more single-bucket aggregations within the path and a final single-bucket or a metrics aggregation at the path end."}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"snippet_pages","node":"srrlBssmSEGsqpZnPnOJmA","reason":{"type":"aggregation_execution_exception","reason":"Invalid terms aggregation order path [by_distance]. Terms buckets can only be sorted on a sub-aggregator path that is built out of zero or more single-bucket aggregations within the path and a final single-bucket or a metrics aggregation at the path end."}}]},"status":500}

Simply says...

Terms buckets can only be sorted on a sub-aggregator path that is built out of zero or more single-bucket aggregations within the path and a final single-bucket or a metrics aggregation at the path end.

Any ideas?


Solution

  • You have Buckets like this:

    1-2

    2-3

    4-5

    and so on. These are no single value buckets with a natural order. Thats what the exception is telling you. So you need something to melt it down to single values.

    Even if you could order by that. Why would you? All with a distance between 1 and 2 would have the same value for comparison and their ordering would be undefined. If its enough for you to know which are 0-1 and 1-2 and so on just turn around the aggregation order. First take the distance and make a subaggregation for terms.

    All in all I think you have a usecase in which aggregations are not what you want because consider the following two documents:

    { name: "peter", location: [0,0] }
    { name: "peter", location: [100,0] }
    

    obviously both peters would melt to one in a terms aggregation. But they have two different locations and therefore the distance will (nearly) always be different. So how can you order peters by distance? As soon as you aggregate a field all other fields more or less become decoupled from it and you cannot use other fields for that.

    So. If you want something like this you most likely have to go via the normal search. Have a look at this on how to sort a search by distance:

    https://www.elastic.co/guide/en/elasticsearch/guide/current/sorting-by-distance.html