Search code examples
pythonelasticsearchkibanaelasticsearch-dsl

ElasticSearch - Filter nested objects without affecting the "parent" object


I have an ElasticSearch mapping for a blog object that contains a nested field for comments. This is so a user can add comments to the blog content shown above. The comments field has a published flag that determines whether or not the comment can be viewed by other users or just by the main user.

"blogs" :[
{
     "id":1,
     "content":"This is my super cool blog post",
     "createTime":"2017-05-31",
      "comments" : [
            {"published":false, "comment":"You can see this!!","time":"2017-07-11"}
       ]
},
{
     "id":2,
     "content":"Hey Guys!",
     "createTime":"2013-05-30",
     "comments" : [
               {"published":true, "comment":"I like this post!","time":"2016-07-01"},
               {"published":false, "comment":"You should not be able to see this","time":"2017-10-31"}
       ]
},
{
     "id":3,
     "content":"This is a blog without any comments! You can still see me.",
     "createTime":"2017-12-21",
     "comments" : None
},
]

I want to be able to filter the comments so only True comments will be displayed for each blog object. I want to show every blog, not just those with true comments. All of the other solutions I have found online seem to affect my blog object. Is there a way to filter out the comment object without affecting the querying of all blogs?

So the above example would be returned after the query as such:

"blogs" :[
{
     "id":1,
     "content":"This is my super cool blog post",
     "createTime":"2017-05-31",
      "comments" : None # OR EMPTY LIST 
},
{
     "id":2,
     "content":"Hey Guys!",
     "createTime":"2013-05-30",
     "comments" : [
               {"published":true, "comment":"I like this post!","time":"2016-07-01"}
       ]
},
{
     "id":3,
     "content":"This is a blog without any comments! You can still see me.",
     "createTime":"2017-12-21",
     "comments" : None
},
]

The example still shows the blogs that have no comments or false comments.

Is this possible?

I have been using a nested query from this example: ElasticSearch - Get only matching nested objects with All Top level fields in search response

But this example affects the blogs themselves and will not return blogs that have only false comments or no comments.

Please help :) Thank you!


Solution

  • Ok so found out that there is apparently no way to do this using the elasticsearch queries. But I figured out a way to do this on the django/python side (which is what I needed). I'm not sure if anyone will need this information, but if you are in need of this and you are using Django/ES/REST this is what I did.

    I followed the elasticsearch-dsl documentation (http://elasticsearch-dsl.readthedocs.io/en/latest/) to connect elasticsearch with my Django app. Then I used the rest_framework_elasticsearch package framework for creating the views.

    To create a Mixin that queries only the True nested attributes in the list of elasticsearch items, create a mixin subclass of the rest_framework_elastic.es_mixins ListElasticMixin object. Then overwrite the es_representation definition as follows in our new mixin.

    class MyListElasticMixin(ListElasticMixin):
        @staticmethod
        def es_representation(iterable):
    
            items = ListElasticMixin.es_representation(iterable)
    
            for item in items:
                for key in item:
                    if key == 'comments' and item[key] is not None:
                        for comment in reversed(item[key]):
                            if not comment['published']:
                                item[key].remove(comment)
    
            return items
    

    Make sure that you use the reversed function in the for loop of comments or you will skip over some of your comments in the list.

    This I use this new filter in my view.

    class MyViewSet(MyListElasticMixin, viewsets.ViewSet):
       # Your view code here
    
        def get(self, request, *args, **kwargs):
            return self.list(request, *args, **kwargs)
    

    Doing it on the python side is definitely easier and worked.