I want to implement an aggregation that only returns the documents whose frequency is above a certain threshold.
For instance, here is the aggregation to get all of the documents with their counts
AggregationBuilder aggregation = AggregationBuilders
.terms("agg").field("column_name");
so this gives me the counts of documents for each value in column_name
[{"doc_count":30,"key":"val1"},{"doc_count":29,"key":"val2"},{"doc_count":23,"key":"val3"}]
now, lets say i dont want all of these documents. I only want those that have a doc_count
greater than 25
So the ideal result would be
[{"doc_count":30,"key":"val1"},{"doc_count":29,"key":"val2"}]
how do i apply such a filter to my aggregation? I was looking at FilterBuilders
and filter aggregations, but they are for applying filters on any values within the documents. For instance i can apply a filter to only get the documents where val1 == xza
for column_name
but that is not what i am looking for. I want to apply a threshold for the doc_cunt
values after the aggregation has been applied.
Is this possible? I am using elasticsearch java api version 1.7.2
Terms aggregation has a built in option called min_doc_count
. See here for their documentation on it. I haven't used Java API, but this example seems to use .minDocCount()
in an example (ctrl-f 'minDocCount')