Search code examples
elasticsearchelasticsearch-aggregationelasticsearch-painless

ElasticSearch 'bucket_script' not executing when one of the bucket paths resolves to 'null'


Let's assume there's a (simplified) index:

PUT test
{
  "mappings": {
    "properties": {
      "numeric_field_sometimes_empty": {
        "type": "integer"
      },
      "numeric_field_always_present": {
        "type": "integer"
      }
    }
  }
}

with numeric fields that may or may not be present in some (potentially all filtered) docs:

POST test/_doc
{
  "numeric_field_always_present": 10  
}

POST test/_doc
{
  "numeric_field_always_present": 20  
}

I'm wanting to execute a bucket_script to calculate certain trends and since bucket_script needs to be a child of a multi-bucket agg, I simulate one using filters. After that nothing stands in the way of creating my numeric single-bucket sub-aggs like so:

GET test/_search
{
  "size": 0,
  "aggs": {
    "multibucket_simulator": {
      "filters": {
        "filters": {
          "all": {
            "match_all": {}
          }
        }
      },
      "aggs": {
        "avg_empty": {
          "avg": {
            "field": "numeric_field_sometimes_empty"
          }
        },
        "avg_non_null": {
          "avg": {
            "field": "numeric_field_always_present"
          }
        },
        "diff": {
          "bucket_script": {
            "buckets_path": {
              "now": "avg_empty.value",
              "before": "avg_non_null.value"
            },
            "script": """
              return (params.now != null ? params.now : 0) 
                   - (params.before != null ? params.before : 0)
            """,
            "format": "###.##"
          }
        }
      }
    }
  }
}

Since I know that some of those sub-aggs' results may be null (a strict null type, not 0), I check whether that was the case with the ternary operators and proceed to return the value difference. This yields:

{
  "aggregations":{
    "multibucket_simulator":{
      "buckets":{
        "all":{
          "doc_count":2,
          "avg_non_null":{
            "value":15.0
          },
          "avg_empty":{
            "value":null
          }
        }
      }
    }
  }
}

with the diff bucket script sub-agg being fully left out. That's suboptimal...

I've tried removing .value from the paths so I can access and check .value directly in the script -- to no avail.

The question is then -- why are the null buckets being skipped and, also, is there an alternative besides bucket_script for this use case?


Solution

  • The docs state that

    The specified metric must be numeric and the script must return a numeric value.

    I suppose it's a matter of discussion whether null falls into that category.


    EDIT 1

    With that being said, setting a missing parameter on each avg agg solves this problem.


    EDIT 2 & Actual Anwer

    It was the gap_policy. It defaults to skip and needed to be set to insert_zeros. Here's the reason.