python elasticsearch elasticsearch-aggregation elasticsearch-dsl-py

How to filter the buckets that have more than N documents using ElasticSearch DSL in python?

I have an index in ElasticSearch that contains information of a user in each document, along with the facebook posts they have made (in a denormalized manner).

Each document contains: User_ID | User_Name | Post_Text | Post_Emojis

I want to retrieve the IDs of the users who have more than N posts.

I am new to using ElasticSearch, especially to Search DSL using python (https://elasticsearch-dsl.readthedocs.io/en/latest/search_dsl.html)

I am creating buckets using the terms aggregation on the User_ID field, and want to filter the buckets based on the number of documents that fall inside each bucket.

This is the function I managed to create, however, as I'm unaware of the proper syntax, and am still confused with the documentation, I can't manage to execute it and attain the correct response.

def users_more_posts_than_query(search_object: Search, num_posts: int):
    search_object = search_object.aggs.bucket('posts_count', 'terms', field='user_id')\
        .pipeline("having_posts", "bucket_selector", buckets_path={"postsCount": "_count"}, script=f"params.postsCount > {num_posts}")

    response = search_object.execute()

    for hit in response.hits:
            hit.user_id

Please point out what I am doing wrong here, and how I can achieve my desired goal.

Solution

Don't re-assign the search_object and aggregations are returned separate from hits

def users_more_posts_than_query(search_object: Search, num_posts: int):
    search_object.aggs.bucket('posts_count', 'terms', field='user_id').pipeline(
        "having_posts", "bucket_selector",
        buckets_path={"postsCount": "_count"},
        script=f"params.postsCount > {num_posts}")
    
    response = search_object.execute()
    for bucket in response.aggregations.posts_count.buckets:
        print(bucket)