I have reviews and each review is linked to a property. So for a property id there could be multiple reviews and these reviews can be duplicate also. Now I need to get the total of duplicate reviews for each property.
This is how I can get duplicate reviews for each property
"query": {
"bool": {
"must_not": [
{
"term": {
"reviewKeyword": {
"value": ""
}
}
}
]
}
},
"aggs": {
"propertyGrouping": {
"terms": {
"field": "propertyId",
"size": 10
}
, "aggs": {
"dupReviwes": {
"terms": {
"field": "reviewKeyword",
"size": 100
}
}
}
}
}
Now what I also want is the sum of the doc counts returned by the sub aggregation. This is the sample response
{
"buckets": [
{
"key": 532,
"doc_count": 2431,
"dupReviwes": {
"doc_count_error_upper_bound": 10,
"sum_other_doc_count": 2382,
"buckets": [
{
"key": "Good hotel",
"doc_count": 31
},
{
"key": "Overall good",
"doc_count": 18
}
]
}
},
{
"key": 496,
"doc_count": 2207,
"dupReviwes": {
"doc_count_error_upper_bound": 8,
"sum_other_doc_count": 2185,
"buckets": [
{
"key": "Good",
"doc_count": 16
},
{
"key": "Nice",
"doc_count": 6
}
]
}
}
]
}
So I also want the sum of doc counts per bucket so for the above case: for the two keys
Key : 532
sum_doc_count=49 (31+18)
key : 496
sum_doc_count=32 (16 + 6)
is this possible with some queries?
Yes, you can achieve that easily using a sum_bucket
pipeline aggregation, simply modify your query like this:
{
"query": {
"bool": {
"must_not": [
{
"term": {
"reviewKeyword": {
"value": ""
}
}
}
]
}
},
"aggs": {
"propertyGrouping": {
"terms": {
"field": "propertyId",
"size": 10
},
"aggs": {
"dupReviews": {
"terms": {
"field": "reviewKeyword",
"size": 100
}
},
"sum_buckets": {
"sum_bucket": {
"buckets_path": "dupReviews>_count"
}
}
}
}
}
}