Search code examples
pythonpython-3.xdjangodjango-modelsdjango-queryset

Duplicate results when querying using SearchVector


I have an abstract model class called PostType with a few submodels derived from it. The model looks like this:


class PostType(models.Model):
    title = models.CharField(
        max_length=255,
        unique=True,
        verbose_name='Title'
    )
    description = models.CharField(
        max_length=255,
        default='',
        verbose_name='Description'
    )
    tags = models.ManyToManyField(
        to=Tag,
        blank=True,
        verbose_name='Tags'
    )
    content = RichTextUploadingField(
        default='',
        verbose_name='Post Content'
    )

    class Meta:
        abstract = True

It has a related Tag model that looks like this:


class Tag(models.Model):
    name = models.CharField(
        max_length=255,
        verbose_name='Name',
        unique=True
    )
    description = models.TextField(
        verbose_name='Description',
        default=''
    )

In one of my views I am trying to query all of the subclasses, using a SearchVector and searching multiple fields:


query = request.GET.get('s')

from django.contrib.postgres.search import SearchVector
get_qs_list = [model.objects.annotate(
    search=SearchVector('title', 'description', 'content', 'tags__name')
).filter(search=query) for model in models.PostType.__subclasses__()]

Now, the search is returning all of the results that it should. However, for some reason it's returning duplicate results for some items.

The problem seems to go away if I remove 'tags__name' from the SearchVector, but I don't understand why.

What am I doing wrong, and why does it work if I don't search the related field?

EDIT:

I just realized that I get a duplicate result for every Tag that the item is related to. So if the item has three tags, I will get three copies of the item in the search result. I still don't understand why this happens though, and I'm grateful for any advice!


Solution

  • I know the question is old, but I've been struggling today with a very similar issue and found a solution. If you look at the search vector definition:

    search = SearchVector(
        'title',
        'description',
        'content',
        'tags__name',
    )
    

    Explanation

    You might notice that while title, description and content are fields on your model PostType, the tags__name refers to a related model Tag. In this case the relation is not a one-to-one - there might be multiple tags related to a certain PostType. If the search results with more than one tag "matching" your search, then you will get exactly as many results and Django's distinct() won't help you (at least not in any way that I know of).

    However, since you are using postgres, you can also take advantage of another feature that is ArrayAgg. It allows you to annotate your queryset with an aggregated value of all tags, and later use is as it was a field on the PostType:

    from django.contrib.postgres.aggregates import ArrayAgg
    
    # 1. annotate with aggregated tag name values
    queryset = queryset.annotate(
        tags_names=ArrayAgg(
            "tags__name",
            distinct=True,
        ),
    )
    
    # 2. use the annotation in the SearchVector
    search = SearchVector(
        'title',
        'description',
        'content',
        'tags_names',  # <-- note: not `tags__name`
    )
    

    For each PostType there will be only one tags_names consisting of aggregated tags__name values now, so in other words - being a list of related tag names; which can now be used in SearchVector like any other field.

    Final solution

    Side note: apart from that, for better results you should also check if search__icontains=query.

    from django.contrib.postgres.search import SearchVector
    from django.contrib.postgres.aggregates import ArrayAgg
    from django.db.models import Q
    
    query = request.GET.get('s')
    
    get_qs_list = [
        model.objects.annotate(
            tags_names=ArrayAgg("tags__name", distinct=True),
            search=SearchVector(
                'title',
                'description',
                'content',
                'tags_names',
            ),
        ).filter(
            Q(search=query) | Q(search__icontains=query)
        ) for model in models.PostType.__subclasses__()
    ]