Search code examples
elasticsearchaggregation

Elasticsearch Aggregation of large list


I'm trying to count how many times ingredients show up in different documents. My index body is similar to this

index_body = {
   "settings":{
      "index":{
         "number_of_replicas":0,
         "number_of_shards":4,
         "refresh_interval":"-1",
         "knn":"true"
      }
   },
   "mappings":{
      "properties":{
         "recipe_id":{
            "type":"keyword"
         },
         "recipe_title":{
            "type":"text",
            "analyzer":"standard",
            "similarity":"BM25"
         },
         "description":{
             "type":"text",
             "analyzer":"standard",
             "similarity":"BM25"
         },
         "ingredient":{
            "type":"keyword"
         },
         "image":{
            "type":"keyword"
         },

         ....
   }
}

In the ingredient field, I've stored an array of strings of each ingredient [ingredient1,ingredient2,....]

I have around 900 documents. Each with their own ingredients list.

I've tried using Elasticsearch's aggregations but it seems to not return what I expected. Here is the query I've been using:

{
        "size":0,
        "aggs":{
            "ingredients":{
                "terms": {"field":"ingredient"} 
            }
        }
    }

But it returns this:

{'took': 4, 'timed_out': False, '_shards': {'total': 4, 'successful': 4, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 994, 'relation': 'eq'}, 'max_score': None, 'hits': []}, 'aggregations': {'ingredients': {'doc_count_error_upper_bound': 56, 'sum_other_doc_count': 4709, 'buckets': [{'key': 'salt', 'doc_count': 631}, {'key': 'oil', 'doc_count': 320}, {'key': 'sugar', 'doc_count': 314}, {'key': 'egg', 'doc_count': 302}, {'key': 'butter', 'doc_count': 291}, {'key': 'flour', 'doc_count': 264}, {'key': 'garlic', 'doc_count': 220}, {'key': 'ground pepper', 'doc_count': 185}, {'key': 'vanilla extract', 'doc_count': 146}, {'key': 'lemon', 'doc_count': 131}]}}}

This is clearly wrong, as I have many ingredients. What am I doing wrong? Why is it returning only these ones? Is there a way to force Elasticsearch to return all counts?


Solution

  • You need to specify size inside the aggregation.

    { "size":0, "aggs":{ "ingredients":{ "terms": {"field":"ingredient", "size": 10000} } } }