Search code examples
pythondjangoperformancepostgresqlquery-optimization

Formulating an efficient DB query with highly challenging requirements (Django app)


This is a tough one. I have a django-based web-app where users can post interesting URLs, and have spectators comment under each posting. Moreover, I use stealth banning against misbehaving users.

Interesting URLs are represented by Link model; comments under the posting are reprenseted by Publicreply model.

Here's how these models are related to one other:

class Link(models.Model): 
    submitter = models.ForeignKey(User)
    submitted_on = models.DateTimeField(auto_now_add=True)
    url = models.URLField(max_length=250)

class Publicreply(models.Model):
    submitted_by = models.ForeignKey(User)
    answer_to = models.ForeignKey(Link)
    submitted_on = models.DateTimeField(auto_now_add=True)
    description = models.TextField(validators=[MaxLengthValidator(250)])

Users I ban go here:

class HellBanList(models.Model):
    condemned = models.ForeignKey(User)
    when = models.DateTimeField(auto_now_add=True)

What's my question? I'm trying to write the best performing, optimized query that gets me:

1. The publicreply with the absolutely latest timestamp FROM among all distinct links THAT either the user herself posted, OR below which the said user left a publicreply

2. But if the publicreply object gotten in 1 was of someone who's part of HellBanList (i.e. condemned attribute), I'd want to ignore that particular publicreply and dig up the next most-recent one. If that's of a banned user too, I'll skip to the next (and so on).

3. Finally, I only need submitted_by and submitted_on attribute values from the publicreply object I end up with after catering for both 1 and 2.


So what have I tried thus far? This:

freshest_link = Link.objects.filter(Q(submitter=self.request.user)|Q(publicreply__submitted_by=self.request.user)).distinct().annotate(date=Max('publicreply__submitted_on')).latest('date')
freshest_reply = Publicreply.objects.filter(answer_to=freshest_link).latest('submitted_by')

What's the problem with this code? The fact that it doesn't cater to requirement 2. I think I ought to have somehow run an exclude() on publicreplys posted by users in HellBanList before calculating freshest_reply (in order to meet requirement 2). Anyhow I can't figure out how to do that. Secondly, an expert can perhaps build the whole query more efficiently than how I've attempted it, goal being minimum possible DB roundtrips.

Can you help out in this? If you feel I've been opaque in my description of what I'm trying to achieve, do ask for clarification.

Note: my DB is postgres


Solution

  • Get the banned users list like:

    banlist = HellBanList.objects.values_list('condemned',flat=True)
    

    modify freshest_reply query like:

    freshest_reply = Publicreply.objects.filter(answer_to=freshest_link).exclude(submitted_by_id__in=banlist).latest('submitted_on')
    

    2nd method:

    In your query, first you get the Link which has the latest public reply and in the next query you get the latest Publicreply for that Link. Instead of that we can just get the latest reply in one query like:

    latest_reply = Publicreply.objects.filter(Q(answer_to__submitter=self.request.user)|Q(submitted_by=self.request.user)).exclude(submitted_by_id__in=banlist).latest('submitted_on')
    

    Let me know if I missed something.