Search code examples
python-3.xelasticsearchelasticsearch-py

How to calculate total for each token in Elasticsearch


I have a request into Elastic

{  
   "query":{  
      "bool":{  
         "must":[  
            {  
               "query_string":{  
                  "query":"something1 OR something2 OR something3",
                  "default_operator":"OR"
               }
            }
         ],
         "filter":{  
            "range":{  
               "time":{  
                  "gte":date
               }
            }
         }
      }
   }
}

I wanna calculate count for each token in all documents using elastic search in one request, for example:

something1: 26 documents
something2: 12 documents
something3: 1 documents

Solution

  • Assuming that the tokens are not akin to enumerations (i.e. constrained set of specific values, like state names, which would make a terms aggregation your best bet with the right mapping), I think the closest thing to what you want would be to use filters aggregation:

    POST your-index/_search
    {
      "query":{  
        "bool":{  
          "must":[  
          {  
            "query_string":{  
              "query":"something1 OR something2 OR something3",
              "default_operator":"OR"
             }
          }
          ],
          "filter":{  
            "range":{  
              "time":{  
                "gte":date
              }
            }
          }
        }
      },
      "aggs": {
        "token_doc_counts": {
          "filters" : {
            "filters" : {
              "something1" : { 
                "bool": { 
                  "must": { "query_string" : { "query" : "something1" } }, 
                  "filter": { "range": { "time": { "gte": date } } } 
                }
              },
              "something2" : { 
                "bool": { 
                  "must": { "query_string" : { "query" : "something2" } }, 
                  "filter": { "range": { "time": { "gte": date } } } 
                }
              },
              "something3" : { 
                "bool": { 
                  "must": { "query_string" : { "query" : "something3" } }, 
                  "filter": { "range": { "time": { "gte": date } } } 
                }
              }
            }
          }
        } 
      }
    }
    

    The response would look something like:

    {
      "took": 9,
      "timed_out": false,
      "_shards": ...,
      "hits": ...,
      "aggregations": {
        "token_doc_counts": {
          "buckets": {
            "something1": {
              "doc_count": 1
            },
            "something2": {
              "doc_count": 2
            },
            "something3": {
              "doc_count": 3
            } 
          } 
        } 
      }
    }