Search code examples
elasticsearchfilteraggregategroup

ElasticSearch - merge/combine the results by group


Following is the data stored in ElasticSearch:

[
  {
    "id": 1,
    "class": "class 1",
    "name": "Scott",
    "scores": [
      { "year": 2022, "score": 100 },
      { "year": 2011, "score": 80 },
      { "year": 2003, "score": 70 }
    ]
  },
  {
    "id": 2,
    "class": "class 1",
    "name": "Gabriel",
    "scores": [
      { "year": 2015, "score": 90 },
      { "year": 2011, "score": 70 }
    ]
  },
  {
    "id": 3,
    "class": "class 2",
    "name": "Scott",
    "scores": [
      { "year": 2022, "score": 100 },
      { "year": 2021, "score": 100 },
      { "year": 2003, "score": 80 }
    ]
  },
  {
    "id": 4,
    "class": "class 2",
    "name": "Pierce",
    "scores": [
      { "year": 2022, "score": 70 }
    ]
  }
]

Is there a way in ElasticSearch to merge/combine scores into one array by specific group? (keep the duplicates values)

For example:

  1. Group by class, it will show the scores of class 1 and class 2, and just keep the class and scores fields, the result would be:
[
  {
    "class": "class 1",
    "scores": [
    { "year": 2022, "score": 100 },
    { "year": 2015, "score": 90 },
    { "year": 2011, "score": 80 },
    { "year": 2011, "score": 70 },
    { "year": 2003, "score": 70 }
    ]
  },
  {
    "class": "class 2",
    "scores": [
    { "year": 2022, "score": 100 },
    { "year": 2022, "score": 70 },
    { "year": 2021, "score": 100 },
    { "year": 2003, "score": 80 }
    ]
  }
]
  1. Group by name, it will put all the scores of Scott into one array, and keep only name and scores fields:
[
  {
    "name": "Scott",
    "scores": [
      { "year": 2022, "score": 100 },
      { "year": 2022, "score": 100 },
      { "year": 2021, "score": 100 },
      { "year": 2011, "score": 80 },
      { "year": 2003, "score": 80 },
      { "year": 2003, "score": 70 }
    ]
  },
  {
    "name": "Gabriel",
    "scores": [
      { "year": 2015, "score": 90 },
      { "year": 2011, "score": 70 }
    ]
  },
  {
    "name": "Pierce",
    "scores": [
      { "year": 2022, "score": 70 }
    ]
  }
]

Thanks!


Solution

  • Disclaimer: Tighten your seatbelt it is going to be verbose ^^

    TLDR;

    Yes, it is possible, using term aggregation.

    Such as, grouping by class:

    You will find a bucket called byClass of type term. Elastic will create bucket of documents per value in the field class.

    -> class 1 and class 2

    But as you will notice it create more aggregation in this aggregation.

    -> nestedAGG, byyear and bynotes

    The first one is a specificity of Elastic.

    The 2 other further break down the bucket by year then notes respectively.

    GET /71128503/_search
    {
      "size": 0,
      "query": {
        "match_all": {}
      },
      "aggs": {
        "byClass": {
          "terms": {
            "field": "class",
            "size": 10
          },
          "aggs": {
            "nestedAGG": {
              "nested": {
                "path": "scores"
              },
              "aggs": {
                "byyear": {
                  "terms": {
                    "field": "scores.year",
                    "size": 10
                  },
                  "aggs": {
                    "bynotes": {
                      "terms": {
                        "field": "scores.score",
                        "size": 10
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
    
    {
      ...
    
      "aggregations" : {
        "byClass" : {
          "doc_count_error_upper_bound" : 0,
          "sum_other_doc_count" : 0,
          "buckets" : [
            {
              "key" : "class 1",
              "doc_count" : 2,
              "nestedAGG" : {
                "doc_count" : 5,
                "byyear" : {
                  "doc_count_error_upper_bound" : 0,
                  "sum_other_doc_count" : 0,
                  "buckets" : [
                    {
                      "key" : 2011,
                      "doc_count" : 2,
                      "bynotes" : {
                        "doc_count_error_upper_bound" : 0,
                        "sum_other_doc_count" : 0,
                        "buckets" : [
                          {
                            "key" : 70,
                            "doc_count" : 1
                          },
                          {
                            "key" : 80,
                            "doc_count" : 1
                          }
                        ]
                      }
                    },
                    {
                      "key" : 2003,
                      "doc_count" : 1,
                      "bynotes" : {
                        "doc_count_error_upper_bound" : 0,
                        "sum_other_doc_count" : 0,
                        "buckets" : [
                          {
                            "key" : 70,
                            "doc_count" : 1
                          }
                        ]
                      }
                    },
                    {
                      "key" : 2015,
                      "doc_count" : 1,
                      "bynotes" : {
                        "doc_count_error_upper_bound" : 0,
                        "sum_other_doc_count" : 0,
                        "buckets" : [
                          {
                            "key" : 90,
                            "doc_count" : 1
                          }
                        ]
                      }
                    },
                    {
                      "key" : 2022,
                      "doc_count" : 1,
                      "bynotes" : {
                        "doc_count_error_upper_bound" : 0,
                        "sum_other_doc_count" : 0,
                        "buckets" : [
                          {
                            "key" : 100,
                            "doc_count" : 1
                          }
                        ]
                      }
                    }
                  ]
                }
              }
            },
            {
              "key" : "class 2",
              "doc_count" : 2,
              "nestedAGG" : {
                "doc_count" : 4,
                "byyear" : {
                  "doc_count_error_upper_bound" : 0,
                  "sum_other_doc_count" : 0,
                  "buckets" : [
                    {
                      "key" : 2022,
                      "doc_count" : 2,
                      "bynotes" : {
                        "doc_count_error_upper_bound" : 0,
                        "sum_other_doc_count" : 0,
                        "buckets" : [
                          {
                            "key" : 70,
                            "doc_count" : 1
                          },
                          {
                            "key" : 100,
                            "doc_count" : 1
                          }
                        ]
                      }
                    },
                    {
                      "key" : 2003,
                      "doc_count" : 1,
                      "bynotes" : {
                        "doc_count_error_upper_bound" : 0,
                        "sum_other_doc_count" : 0,
                        "buckets" : [
                          {
                            "key" : 80,
                            "doc_count" : 1
                          }
                        ]
                      }
                    },
                    {
                      "key" : 2021,
                      "doc_count" : 1,
                      "bynotes" : {
                        "doc_count_error_upper_bound" : 0,
                        "sum_other_doc_count" : 0,
                        "buckets" : [
                          {
                            "key" : 100,
                            "doc_count" : 1
                          }
                        ]
                      }
                    }
                  ]
                }
              }
            }
          ]
        }
      }
    }
    

    To reproduce

    To ingest the data. Notice the custom mapping:

    • keyword (grouping by term should not be done on text type fields)
    • Nested (Elastic by default flatten objects)
    PUT /71128503/
    {
      "settings": {},
      "mappings": {
        "properties": {
          "class": {
            "type": "keyword"
          },
          "name":{
            "type": "keyword"
          },
          "scores":{
            "type": "nested",
            "properties": {
              "score": {
                "type": "integer"
              },
              "year": {
                "type": "integer"
              }
            }
          }
        }
      }
    }
    
    
    POST /_bulk
    {"index":{"_index":"71128503","_id":1}}
    {"class":"class 1","name":"Scott","scores":[{"year":2022,"score":100},{"year":2011,"score":80},{"year":2003,"score":70}]}
    {"index":{"_index":"71128503","_id":2}}
    {"class":"class 1","name":"Gabriel","scores":[{"year":2015,"score":90},{"year":2011,"score":70}]}
    {"index":{"_index":"71128503","_id":3}}
    {"class":"class 2","name":"Scott","scores":[{"year":2022,"score":100},{"year":2021,"score":100},{"year":2003,"score":80}]}
    {"index":{"_index":"71128503","_id":4}}
    {"class":"class 2","name":"Pierce","scores":[{"year":2022,"score":70}]}
    
    

    Then to query the data:

    By class / By name

    GET /71128503/_search
    {
      "size": 0,
      "query": {
        "match_all": {}
      },
      "aggs": {
        "byName": {           <- Name of your bucket
          "terms": {          <- Type of grouping, Elastic support many, like sum, avg on numeric value ....
            "field": "name",  <- Field you grouping on
            "size": 10
          },
          "aggs": {
            "nestedAGG": {
              "nested": {
                "path": "scores"
              },
              "aggs": {
                "byyear": {
                  "terms": {
                    "field": "scores.year",
                    "size": 10
                  },
                  "aggs": {
                    "bynotes": {
                      "terms": {
                        "field": "scores.score",
                        "size": 10
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }