I have an abstract model class called PostType
with a few submodels derived from it.
The model looks like this:
class PostType(models.Model):
title = models.CharField(
max_length=255,
unique=True,
verbose_name='Title'
)
description = models.CharField(
max_length=255,
default='',
verbose_name='Description'
)
tags = models.ManyToManyField(
to=Tag,
blank=True,
verbose_name='Tags'
)
content = RichTextUploadingField(
default='',
verbose_name='Post Content'
)
class Meta:
abstract = True
It has a related Tag model that looks like this:
class Tag(models.Model):
name = models.CharField(
max_length=255,
verbose_name='Name',
unique=True
)
description = models.TextField(
verbose_name='Description',
default=''
)
In one of my views I am trying to query all of the subclasses, using a SearchVector
and searching multiple fields:
query = request.GET.get('s')
from django.contrib.postgres.search import SearchVector
get_qs_list = [model.objects.annotate(
search=SearchVector('title', 'description', 'content', 'tags__name')
).filter(search=query) for model in models.PostType.__subclasses__()]
Now, the search is returning all of the results that it should. However, for some reason it's returning duplicate results for some items.
The problem seems to go away if I remove 'tags__name'
from the SearchVector
, but I don't understand why.
What am I doing wrong, and why does it work if I don't search the related field?
EDIT:
I just realized that I get a duplicate result for every Tag
that the item is related to. So if the item has three tags, I will get three copies of the item in the search result. I still don't understand why this happens though, and I'm grateful for any advice!
I know the question is old, but I've been struggling today with a very similar issue and found a solution. If you look at the search vector definition:
search = SearchVector(
'title',
'description',
'content',
'tags__name',
)
You might notice that while title
, description
and content
are fields on your model PostType
, the tags__name
refers to a related model Tag
. In this case the relation is not a one-to-one - there might be multiple tags related to a certain PostType. If the search results with more than one tag "matching" your search, then you will get exactly as many results and Django's distinct()
won't help you (at least not in any way that I know of).
However, since you are using postgres, you can also take advantage of another feature that is ArrayAgg. It allows you to annotate your queryset with an aggregated value of all tags, and later use is as it was a field on the PostType:
from django.contrib.postgres.aggregates import ArrayAgg
# 1. annotate with aggregated tag name values
queryset = queryset.annotate(
tags_names=ArrayAgg(
"tags__name",
distinct=True,
),
)
# 2. use the annotation in the SearchVector
search = SearchVector(
'title',
'description',
'content',
'tags_names', # <-- note: not `tags__name`
)
For each PostType
there will be only one tags_names
consisting of aggregated tags__name
values now, so in other words - being a list of related tag names; which can now be used in SearchVector
like any other field.
Side note: apart from that, for better results you should also check if search__icontains=query
.
from django.contrib.postgres.search import SearchVector
from django.contrib.postgres.aggregates import ArrayAgg
from django.db.models import Q
query = request.GET.get('s')
get_qs_list = [
model.objects.annotate(
tags_names=ArrayAgg("tags__name", distinct=True),
search=SearchVector(
'title',
'description',
'content',
'tags_names',
),
).filter(
Q(search=query) | Q(search__icontains=query)
) for model in models.PostType.__subclasses__()
]