Search code examples
elasticsearchelasticsearch-aggregation

Sorting a reverse nested back to parent aggregation


I'm currently aggregating a collection by a multi-level nested field and calculating some sub-aggregation metrics from this collection and thats working using elasticsearch's reverse nested feature as described at Sub-aggregate a multi-level nested composite aggregation.

My current struggle is to find a way to sort the aggregations by one of the calculated metrics. For example, considering the following document and my current search call I would like to sort all the aggregations by their clicks sums.

I've tried using bucket_sort inside the inner aggs at the back_to_parent level but got the following java exception.

class org.elasticsearch.search.aggregations.bucket.nested.InternalReverseNested cannot be cast to class org.elasticsearch.search.aggregations.InternalMultiBucketAggregation 
(org.elasticsearch.search.aggregations.bucket.nested.InternalReverseNested and org.elasticsearch.search.aggregations.InternalMultiBucketAggregation are in unnamed module of loader 'app')
{
  id: '32ead132eq13w21',
  statistics: {
    clicks: 123,
    views: 456
  },
  categories: [{ //nested type
    name: 'color',
    tags: [{ //nested type
      slug: 'blue'
    },{
      slug: 'red'
    }]
  }]
}
GET /acounts-123321/_search
{
  size: 0,
  aggs: {
    categories_parent: {
        nested: {
          path: 'categories.tags'
        },
        aggs: {
          filtered: {
            filter: {
              term: { 'categories.tags.category': 'color' }
            },
            aggs: {
              by_slug: {
                terms: {
                  field: 'categories.tags.slug',
                  size: perPage
                },
                aggs: {
                  back_to_parent: {
                    reverse_nested: {},
                    aggs: {
                      clicks: {
                        sum: {
                          field: 'statistics.clicks'
                        }
                      },
                      custom_metric: {
                        scripted_metric: {
                          init_script: 'state.accounts = []',
                          map_script: 'state.accounts.add(new HashMap(params["_source"]))',
                          combine_script: 'double result = 0;
                            for (acc in state.accounts) {
                              result += ( acc.statistics.clicks + acc.statistics.impressions);
                            }
                            return result;',
                          reduce_script: 'double sum = 0;
                            for (state in states) {
                              sum += state;
                            }                            
                            return sum;'
                        }
                      },
                      by_tag_sort: {
                        bucket_sort: {
                          sort: [{ 'clicks.value': { order: 'desc' } }]
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }

Update:

It would also be nice to understand how it would be possible to sort the buckets by a custom metric calculated through a painless scripted_metric. I have updated the search call above adding a sample custom_metric that I wish to allow sorting through it.

I see that using bucket_sort directly does not work with the standard sort array we use for concrete fields. So the following does not seem to sort things. It also won't work for a sort script as well since [bucket_sort] only supports field based sorting.

by_tag_sort: {
  bucket_sort: {
    sort: [{ 'custom_metric.value': { order: 'desc' } }]
  }
}

Solution

  • bucket_sort expects to be run within a multi-bucket context but your reverse_nested aggregation is single-bucket (irrespective of the fact that it's a child of a multi-bucket terms aggregation).

    The trick is to use an empty-ish filters aggregation to generate a multi-bucket context and then run the bucket sort:

    {
      "size": 0,
      "aggs": {
        "categories_parent": {
          "nested": {
            "path": "categories.tags"
          },
          "aggs": {
            "filtered": {
              "filter": {
                "term": {
                  "categories.tags.category": "color"
                }
              },
              "aggs": {
                "by_slug": {
                  "terms": {
                    "field": "categories.tags.slug",
                    "size": 10
                  },
                  "aggs": {
                    "back_to_parent": {
                      "reverse_nested": {},
                      "aggs": {
                        "multi_bucket_emulator": {
                          "filters": {
                            "filters": {
                              "placeholder_match_all_query": {
                                "match_all": {}
                              }
                            }
                          },
                          "aggs": {
                            "clicks": {
                              "sum": {
                                "field": "statistics.clicks"
                              }
                            },
                            "by_tag_sort": {
                              "bucket_sort": {
                                "sort": [
                                  {
                                    "clicks.value": {
                                      "order": "desc"
                                    }
                                  }
                                ]
                              }
                            }
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
    

    Update: sorting by the result of a custom scripted metric value

    {
      "size": 0,
      "aggs": {
        "categories_parent": {
          "nested": {
            "path": "categories.tags"
          },
          "aggs": {
            "filtered": {
              "filter": {
                "term": {
                  "categories.tags.category": "color"
                }
              },
              "aggs": {
                "by_slug": {
                  "terms": {
                    "field": "categories.tags.slug",
                    "size": 10
                  },
                  "aggs": {
                    "back_to_parent": {
                      "reverse_nested": {},
                      "aggs": {
                        "multi_bucket_emulator": {
                          "filters": {
                            "filters": {
                              "placeholder_match_all_query": {
                                "match_all": {}
                              }
                            }
                          },
                          "aggs": {
                            "clicks": {
                              "sum": {
                                "field": "statistics.clicks"
                              }
                            },
                            "custom_metric": {
                              "scripted_metric": {
                                "init_script": "state.accounts = []",
                                "map_script": """state.accounts.add(params["_source"])""",
                                "combine_script": """
                                    double result = 0;
                                    for (def acc : state.accounts) {
                                      result += ( acc.statistics.clicks + acc.statistics.impressions);
                                    }
                                    return result;
                                """,
                                "reduce_script": """
                                  double sum = 0;
                                  for (def state : states) {
                                    sum += state;
                                  }                            
                                  return sum;
                                """
                              }
                            },
                            "by_tag_sort": {
                              "bucket_sort": {
                                "sort": [
                                  {
                                    "custom_metric.value": {
                                      "order": "desc"
                                    }
                                  }
                                ]
                              }
                            }
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }