Search code examples
django-queryset

Django ORM Database Query Optimization


I am a beginner in backend development and Django, and I have a task to optimize database queries. I need a method that will return all participants based on the specified conditions. To assess the number and time of queries, I am using Django DebugToolbar. I need to avoid N database queries.

Initially, I had the following method:

class ParticipantListView(generics.ListAPIView):
    queryset = Participant.objects.all()
    serializer_class = ParticipantSerializer
    filter_backends = [DjangoFilterBackend, filters.SearchFilter, filters.OrderingFilter]
    filterset_fields = ['gender', 'first_name', 'last_name']
    search_fields = ['first_name', 'last_name', 'gender']
    ordering_fields = ['first_name', 'last_name', 'gender']

    def get_queryset(self):
        queryset = super().get_queryset()
        distance = self.request.query_params.get('distance')

        if distance:
            user_latitude = User.objects.get(id=2).participant.latitude
            user_longitude = User.objects.get(id=2).participant.longitude

            user_location = (user_latitude, user_longitude)

            for participant in queryset:
                participant_latitude = participant.latitude
                participant_longitude = participant.longitude
                participant_location = (participant_latitude, participant_longitude)

                dist = geo_distance(user_location, participant_location).km
                if dist > float(distance):
                    queryset = queryset.exclude(pk=participant.pk)

        return queryset

Database queries in implementations through iteration. But I found this implementation incorrect and slow, so I started looking for a better approach. That's when I learned about Annotate in the Django ORM.

class ParticipantListViewTest(generics.ListAPIView):
    queryset = Participant.objects.all()
    serializer_class = ParticipantSerializer
    filter_backends = [DjangoFilterBackend, filters.SearchFilter, filters.OrderingFilter]
    filterset_fields = ['gender', 'first_name', 'last_name']
    search_fields = ['first_name', 'last_name', 'gender']
    ordering_fields = ['first_name', 'last_name', 'gender']

    def get_queryset(self):
        queryset = super().get_queryset()
        distance = self.request.query_params.get('distance')

        if distance:
            user = User.objects.get(id=2).participant
            user_latitude = user.latitude
            user_longitude = user.longitude

            # Calculate the distance in kilometers using Haversine formula
            lat_diff = Radians(F('latitude') - user_latitude)
            long_diff = Radians(F('longitude') - user_longitude)
            a = (Sin(lat_diff / 2) ** 2) + Cos(Radians(user_latitude)) * Cos(Radians(F('latitude'))) * (
                        Sin(long_diff / 2) ** 2)
            c = 2 * ACos(Cos(a))
            earth_radius_km = 6371.0
            distance_km = ExpressionWrapper(c * earth_radius_km, output_field=FloatField())

            queryset = queryset.annotate(distance_km=distance_km).filter(distance_km__lt=float(distance))
        return queryset

Queries to the database in implementations through Annotate. I assumed that this approach would help me avoid N queries, but Django DebugToolbar still shows that I have N queries.

My model and serializer.

class ParticipantSerializer(serializers.ModelSerializer):
    class Meta:
        model = Participant
        fields = '__all__'
class Participant(models.Model):
    GENDER_CHOICES = [
        ('M', ' Male'),
        ('F', 'Female'),
        ('O', 'Other'),
    ]
    user = models.OneToOneField(User, on_delete=models.CASCADE, default=None)
    avatar = models.ImageField(upload_to='avatars/')
    gender = models.CharField(max_length=1, choices=GENDER_CHOICES)
    first_name = models.CharField(max_length=50)
    last_name = models.CharField(max_length=50)
    email = models.EmailField(unique=True)
    likes = models.ManyToManyField('self', related_name='liked_by', symmetrical=False, blank=True)
    latitude = models.DecimalField(max_digits=9, decimal_places=6)
    longitude = models.DecimalField(max_digits=9, decimal_places=6)

I would like to ask for help with the following questions:

What is the purpose of Annotate, if this approach is not faster than the normal loop iteration? How can I avoid N database queries?

I will be grateful if you could also recommend resources that discuss best practices for query optimization.


Solution

  • Your approach with annotating the location to reduce the number of queries seems good to me. (Also you should look at DB functions that calculate distance instead of doing it yourself).

    Looking at the toolbar, the issue lies in the ParticipantLike relation.
    You did not share your models, but I guess you're serializing related ParticipantLikes in your ParticipantSerializer.
    For each Participant in the queryset, there will be a query to get Likes. This is what you see in the toolbar.
    If you're only accessing my_participant.likes.all() in the serializer, a simple queryset = queryset.prefetch_related("likes") would be enough.

    If you need some filtering on those like, you'll need to do prefetch_related(Prefetch("likes", queryset=ParticipantLike.objects.filter(...))). Calling my_participant.likes.all() will yield the related Likes matching the filters without additionnal queries.

    There is a good chance you want to count likes. In this case some .annotate() can help you. queryset = queryset.annotate(like_count=Count("likes")) for instance