Search code examples
elasticsearchelasticsearch-java-apielasticsearch-aggregation

Elasticsearch Java API : Aggregation Filter for document counts


I want to implement an aggregation that only returns the documents whose frequency is above a certain threshold.

For instance, here is the aggregation to get all of the documents with their counts

AggregationBuilder aggregation = AggregationBuilders
                .terms("agg").field("column_name");

so this gives me the counts of documents for each value in column_name

[{"doc_count":30,"key":"val1"},{"doc_count":29,"key":"val2"},{"doc_count":23,"key":"val3"}]

now, lets say i dont want all of these documents. I only want those that have a doc_count greater than 25

So the ideal result would be

[{"doc_count":30,"key":"val1"},{"doc_count":29,"key":"val2"}]

how do i apply such a filter to my aggregation? I was looking at FilterBuilders and filter aggregations, but they are for applying filters on any values within the documents. For instance i can apply a filter to only get the documents where val1 == xza for column_name

but that is not what i am looking for. I want to apply a threshold for the doc_cunt values after the aggregation has been applied.

Is this possible? I am using elasticsearch java api version 1.7.2


Solution

  • Terms aggregation has a built in option called min_doc_count. See here for their documentation on it. I haven't used Java API, but this example seems to use .minDocCount() in an example (ctrl-f 'minDocCount')