Search code examples
elasticsearch-aggregationelasticsearch-dsl

Elasticsearch aggregation on array field with in a range


I'm going to perform a count aggregation on an array field by filtering the values in a range. For example, I have following 3 documents and I want to find out value counts for purchase_date_list that are between 20210101 to now(). The expected result (purchase count between 20210101 - now()) will be that: customer_id: 1, purchase count is: 2 customer_id: 2, purchase count is: 0 customer_id: 3, purchase count is: 1

Can anybody please help with some ideas on how to compose a aggregation query for above request?

Many thanks!

{
  customer_id: 1,
  purchase_date_list: [
    20050101,
    20210304,
    20211121
  ]
},
{
  customer_id: 2,
  purchase_date_list: [
    20100301
  ]
},
{
  customer_id: 3,
  purchase_date_list: [
    20210701
  ]
}

Solution

  • Following on my comment, it solved it using painless. (As I still am not sure how do deal with it using aggregation)

    Here is the documentation that helped me solve this issue. [doc]

    Set up

    PUT /so_agg_test/
    
    POST /so_agg_test/_doc
    {
      "customer_id": 1,
      "purchase_date_list": [
        20050101,
        20210304,
        20211121
      ]
    }
    
    POST /so_agg_test/_doc
    {
      "customer_id": 2,
      "purchase_date_list": [
        20100301
      ]
    }
    
    POST /so_agg_test/_doc
    {
      "customer_id": 3,
      "purchase_date_list": [
        20210701
      ]
    }
    
    GET /so_agg_test/_search
    

    Solution

    This query will create a new field, named number_of_sales_interval in you hits.

    GET /so_agg_test/_search
    {
      "query": {
        "match_all": {}
      },
      "script_fields": {
        "number_of_sales_interval": {
          "script": {
            "lang": "painless",
            "params": {
              "lower_bound": 20210101
            }, 
            "source": """
            def dates = doc['purchase_date_list'];
            
            def number_of_sales_interval = 0;
            for(date in dates){
              if(date > params.lower_bound){
                number_of_sales_interval += 1;
              }
            }
            
            return number_of_sales_interval;
            """
          }
        }
      }
    }
    

    Results

    You should have something along those lines.

    {
      "took" : 0,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 3,
          "relation" : "eq"
        },
        "max_score" : 1.0,
        "hits" : [
          {
            "_index" : "so_agg_test",
            "_type" : "_doc",
            "_id" : "UOVplX0B0iK523s0yaCu",
            "_score" : 1.0,
            "fields" : {
              "number_of_sales_interval" : [
                2
              ]
            }
          },
          {
            "_index" : "so_agg_test",
            "_type" : "_doc",
            "_id" : "UeVplX0B0iK523s01KC-",
            "_score" : 1.0,
            "fields" : {
              "number_of_sales_interval" : [
                0
              ]
            }
          },
          {
            "_index" : "so_agg_test",
            "_type" : "_doc",
            "_id" : "UuVplX0B0iK523s04KAT",
            "_score" : 1.0,
            "fields" : {
              "number_of_sales_interval" : [
                1
              ]
            }
          }
        ]
      }
    }