Search code examples
python-2.7elasticsearchelasticsearch-dsl

Elasticsearch DSL: filter, then aggregate in python


I need to filter documents in an Elasticsearch index and then aggregate them by field. Here is the code of what I am trying to do:

import elasticsearch
from elasticsearch_dsl import Search, Q, Index, MultiSearch
es_client = elasticsearch.Elasticsearch([url],
        timeout=30, retry_on_timeout=True)
project_ids=['CSI'] 
family_ids=['SF6140691_WES_CIDR'] 
sample_ids=['S1379354_CIDR'] 
gene_symbols=['GLTPD1', 'CCNL2', 'MRPL20'] 

genes_filter = Q('bool', must=[Q('terms', project_id=project_ids),
                                   Q('terms', family_id=family_ids),
                                   Q('terms', sample_id=sample_ids),
                                   Q('terms', gene_symbol=gene_symbols)])
search = Search(using=es_client, index="GENES_DATA")
search = search.filter(genes_filter).execute()
results = search.aggs.bucket('by_family', 'terms', field='family_id', size=0)

Currently I am getting the following error:

'{!r} object has no attribute {!r}'.format(self.class.name, name)) AttributeError: 'Terms' object has no attribute 'execute'

I tried to switch filtering and aggregation, tried doing execute() at the very end, but it does not help. How could this simple transformation be achieved - filtering + aggregation? I found examples of doing aggregations separately or filtering separately but have trouble finding both in one query.


Solution

  • instead of

    search = search.filter(genes_filter)
    results = search.aggs.bucket('by_family', 'terms', field='family_id', size=0)
    

    you should have:

    search = search.filter(genes_filter)
    search.aggs.bucket('by_family', 'terms', field='family_id', size=0)
    results = search.execute()
    

    First you add a filter, then you define the aggregations and finally you execute your search.