I am using ES to get statistics per day in a case where I do not have input data for everyday, but where missing data is also an information as such (in my case I am measuring an activity). To achieve such task I use the average bucket aggregation and I set the min_doc_count to 0 in order to have a value even when my data is missing. My query is the following :
query = {
"size": 0,
"query": {
"bool" : {
"should" :{
"match" : { "LogEntryType" : "LogEntry" }
},
"must" : {
'match' : {'Id' : id}
}
}
},
"aggs": {
"actions_per_day": {
"date_histogram": {
"field": "CreationTime",
"interval": "day",
"min_doc_count" : 0
},
"aggs": {
"amount": {
"value_count": {
"field": "CreationTime"
}
}
}
},
"avg_daily_action": {
"avg_bucket": {
"buckets_path": "actions_per_day>amount"
}
}
}
}
es1.search(index=indx1, body=query)
And it does return objects with a value set to 0, as you can see in the following extract of the response :
{'took': 1,
'timed_out': False,
'_shards': {'total': 2, 'successful': 2, 'failed': 0},
'hits': {'total': 4321, 'max_score': 0.0, 'hits': []},
'aggregations': {'actions_per_day': {'buckets': [
{'key_as_string': '2018-07-13T00:00:00.000Z',
'key': 1531440000000,
'doc_count': 631,
'amount': {'value': 631}},
{'key_as_string': '2018-07-14T00:00:00.000Z',
'key': 1531526400000,
'doc_count': 0,
'amount': {'value': 0}}...
But my problem is that for the average daily action, the computation does not consider the 0 values, it returns :
...'avg_daily_action': {'value': 432.1}
Which corresponds to the average not taking into account the days when no input was given. I have found this issue, where it is mentioned that I should use the "missing" parameter, however I think that it is to retrieve elements with 0, and I already have them. Any clues on how to take them into account?
It turns out I was almost there. One must use the gap_policy parameter in the average bucket aggregation to take into account the 0 values. The average bucket part now becomes
"avg_daily_operations": {
"avg_bucket": {
"buckets_path": "modifications_per_day>amount",
"gap_policy" : "insert_zeros"
}
}
The default value is set to skip, and changing it solves the issue.