Search code examples
elasticsearchelasticsearch-aggregation

Elastic Search adding a text field to my aggregation


I have article information like this in Elastic Search:

{
   "ArticleId":355027,
   "ArticleNumber":"433398",
   "CharacteristicsMultiValue":[
      {
         "Name":"Aantal cartridges",
         "Value":"4",
         "NumValue":4,
         "Priority":2147483647
      },
      {
         "Name":"ADF",
         "Value":"Ja",
         "Priority":10,
         "Description":"Een Automatic Document Feeder (ADF), of automatische documentinvoer, laat een multifunctionele printer (all-in-one) automatisch meerdere vellen na elkaar verwerken. Door meerdere vellen in de ADF te plaatsen, wordt ieder vel papier stuk voor stuk automatisch gekopieerd of gescand."
      },
      {
         "Name":"Scanresolutie",
         "Value":"600x600 DPI",
         "Priority":2147483647
      }
   ]
}

I'm running the following query to retrieve all the occurrences of the CharacteristicsMultiValue for my search with all possible values and sort them to my liking.

{
  "query": {
    "query_string": {
     "query": "433398",
     "default_operator": "and"
    }
  },
  "aggs":{
    "CharacteristicsMultiValue":{
      "nested":{
        "path":"CharacteristicsMultiValue"
       },
       "aggs":{
         "Name":{
           "terms":{
            "field":"CharacteristicsMultiValue.Name",
            "size":25
          },
          "aggs":{
            "Value":{
              "terms":{
                "field":"CharacteristicsMultiValue.Value",
                "size":25
              }
            }, 
            "Priority":{
              "avg":{
                "field":"CharacteristicsMultiValue.Priority"
              }
            },
            "Characteristics_sort": {
              "bucket_sort": {
                "sort": [
                  { "Priority": { "order": "asc" } } 
                ]                               
              }
            }       
          }
        }
      }
    }
  }
}

The result shows a list of CharacteristicsMultiValue like below.

{
   "key":"ADF",
   "doc_count":1,
   "Priority":{
      "value":10
   },
   "Value":{
      "doc_count_error_upper_bound":0,
      "sum_other_doc_count":0,
      "buckets":[
         {
            "key":"Ja",
            "doc_count":1
         }
      ]
   }
}

This all works great. I want to make a change so the the CharacteristicsMultiValue.Description field is included in the aggregation. I'm not really experienced with Elastic Search, but I feel I should be able to do this pretty easily.

I did some research and to my understanding I would need to add a new sub aggregation for the description column. I tried to do that by adding the JSON below to my current query on several places but I keep getting 404 errors. Could anyone tell me how I could add (the first found) description field to my aggregation.

"aggs":{
    "Description":{
        "terms":{
            "field":"CharacteristicsMultiValue.Description",
            "size":1
        }
    }
}

I tested the solution proposed by Joe. This results in the following error response:

{ 
  "error": { 
    "root_cause": [ 
      {
        "type": "illegal_argument_exception",
        "reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [CharacteristicsMultiValue.Description] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "articles_dev1_nl",
        "node": "HiGH6JY9QvOozRSWJmFXpw",
        "reason": {
          "type": "illegal_argument_exception",
          "reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [CharacteristicsMultiValue.Description] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
        }
      }
    ],
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [CharacteristicsMultiValue.Description] in order to load field data by uninverting the inverted index. Note that this can use significant memory.",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [CharacteristicsMultiValue.Description] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
      }
    }
  },
  "status": 400
}

Solution

  • I don't know why you're getting 404 errors -- it's usually 400 Bad Request if your aggregations' syntax is off.

    Either way, if you want to find the top Description terms under every bucketed Value, you can use:

    {
      "query": {
        "query_string": {
          "query": "433398",
          "default_operator": "and"
        }
      },
      "aggs": {
        "CharacteristicsMultiValue": {
          "nested": {
            "path": "CharacteristicsMultiValue"
          },
          "aggs": {
            "Name": {
              "terms": {
                "field": "CharacteristicsMultiValue.Name",
                "size": 25
              },
              "aggs": {
                "Value": {
                  "terms": {
                    "field": "CharacteristicsMultiValue.Value",
                    "size": 25
                  },
        -->       "aggs": {
                    "Description": {
                      "terms": {
                        "field": "CharacteristicsMultiValue.Description",
                        "size": 1
                      }
                    }
                  }
                },
                "Priority": {
                  "avg": {
                    "field": "CharacteristicsMultiValue.Priority"
                  }
                },
                "Characteristics_sort": {
                  "bucket_sort": {
                    "sort": [
                      {
                        "Priority": {
                          "order": "asc"
                        }
                      }
                    ]
                  }
                }
              }
            }
          }
        }
      }
    }
    

    Generally speaking, sub-aggregations adhere to the following schema:

    {
      "query": { },  // optional query
      "aggs": {
        "your_agg_name": {
          "agg_type": {
            // agg spec
          },
          "aggs": {
            "your_sub_agg_name_1": {
              "agg_type": {
                // agg spec
              }
            },
            "your_sub_agg_name_2_if_needed": {
              "agg_type": {
                // agg spec
              }
            },
            ...
          }
        }
      }
    }
    

    and you can:

    • nest further sub-aggs like you're already doing with Name->Value or Value->Description from my example
    • or keep them on the same level like you did with Name->Value and Name->Priority.

    💡 Tip: your query is already quite heavily nested so you could explore the typed_keys query parameter to determine more easily which bucket corresponds to which sub-aggregation.


    Edit

    As described in the error msg, the Description field needs to be aggregatable before any aggregations are performed.

    So if you drop your index, you should turn fielddata on:

    PUT articles_dev1_nl
    {
      "mappings": {
        "properties": {
          "CharacteristicsMultiValue": {
            "type": "nested",
            "properties": {
              .... other props ...
              
              "Description": {
                "type": "text",
                "fielddata": true        <---
              }
            }
          }
        }
      }
    }
    

    or, if your index already exists, you can use the update API:

    PUT articles_dev1_nl/_mapping
    {
      "properties": {
        "CharacteristicsMultiValue": {
          "type": "nested",
          "properties": {
            "Description": {
              "type": "text",
              "fielddata": true
            }
          }
        }
      }
    }
    

    You can learn more about fielddata vs. keyword here in the docs.