Let's assume there's a (simplified) index:
PUT test
{
"mappings": {
"properties": {
"numeric_field_sometimes_empty": {
"type": "integer"
},
"numeric_field_always_present": {
"type": "integer"
}
}
}
}
with numeric fields that may or may not be present in some (potentially all filtered) docs:
POST test/_doc
{
"numeric_field_always_present": 10
}
POST test/_doc
{
"numeric_field_always_present": 20
}
I'm wanting to execute a bucket_script
to calculate certain trends and since bucket_script
needs to be a child of a multi-bucket agg, I simulate one using filters
. After that nothing stands in the way of creating my numeric single-bucket sub-aggs like so:
GET test/_search
{
"size": 0,
"aggs": {
"multibucket_simulator": {
"filters": {
"filters": {
"all": {
"match_all": {}
}
}
},
"aggs": {
"avg_empty": {
"avg": {
"field": "numeric_field_sometimes_empty"
}
},
"avg_non_null": {
"avg": {
"field": "numeric_field_always_present"
}
},
"diff": {
"bucket_script": {
"buckets_path": {
"now": "avg_empty.value",
"before": "avg_non_null.value"
},
"script": """
return (params.now != null ? params.now : 0)
- (params.before != null ? params.before : 0)
""",
"format": "###.##"
}
}
}
}
}
}
Since I know that some of those sub-aggs' results may be null
(a strict null
type, not 0
), I check whether that was the case with the ternary operators and proceed to return the value difference. This yields:
{
"aggregations":{
"multibucket_simulator":{
"buckets":{
"all":{
"doc_count":2,
"avg_non_null":{
"value":15.0
},
"avg_empty":{
"value":null
}
}
}
}
}
}
with the diff
bucket script sub-agg being fully left out. That's suboptimal...
I've tried removing .value
from the paths so I can access and check .value
directly in the script -- to no avail.
The question is then -- why are the null
buckets being skipped and, also, is there an alternative besides bucket_script
for this use case?
The docs state that
The specified metric must be numeric and the script must return a numeric value.
I suppose it's a matter of discussion whether null
falls into that category.
EDIT 1
With that being said, setting a missing
parameter on each avg
agg solves this problem.
It was the gap_policy
. It defaults to skip
and needed to be set to insert_zeros
. Here's the reason.