Search code examples
djangodjango-querysetdjango-nonrel

I'm confused about how distinct() works with Django queries


I have this query:

checkins = CheckinAct.objects.filter(time__range=[start, end], location=checkin.location)

Which works great for telling me how many checkins have happened in my date range for a specific location. But I want know how many checkins were done by unique users. So I tried this:

checkins = CheckinAct.objects.filter(time__range=[start, end], location=checkin.location).values('user').distinct()

But that doesn't work, I get back an empty Array. Any ideas why?

Here is my CheckinAct model:

class CheckinAct(models.Model):
    user = models.ForeignKey(User)
    location = models.ForeignKey(Location)
    time = models.DateTimeField()

----Update------ So now I have updated my query to look like this:

 checkins = CheckinAct.objects.values('user').\
                            filter(time__range=[start, end], location=checkin.location).\
                            annotate(dcount=Count('user'))

But I'm still getting multiple objects back that have the same user, like so:

 [{'user': 15521L}, {'user': 15521L}, {'user': 15521L}, {'user': 15521L}, {'user': 15521L}]

---- Update 2------ Here is something else I tried, but I'm still getting lots of identical user objects back when I log the checkins object.

checkins = CheckinAct.objects.filter(
                    time__range=[start, end],
                    location=checkin.location,
                ).annotate(dcount=Count('user')).values('user', 'dcount')
        logger.info("checkins!!! : " + str(checkins))

Logs the following:

checkins!!! : [{'user': 15521L}, {'user': 15521L}, {'user': 15521L}]

Notice how there are 3 instances of the same user object. Is this working correctly or not? Is there a difference way to read out what comes back in the dict object? I just need to know how many unique users check into that specific location during the time range.


Solution

  • The answer is actually right in the Django docs. Unfortunately, very little attention is drawn to the importance of the particular part you need; so it's understandably missed. (Read down a little to the part dealing with Items.)

    For your use-case, the following should give you exactly what you want:

    checkins = CheckinAct.objects.filter(time__range=[start,end], location=checkin.location).\
                                  values('user').annotate(checkin_count=Count('pk')).order_by()
    

    UPDATE

    Based on your comment, I think the issue of what you wanted to achieve has been confused all along. What the query above gives you is a list of the number of times each user checked in at a location, without duplicate users in said list. It now seems what you really wanted was the number of unique users that checked in at one particular location. To get that, use the following (which is much simpler anyways):

    User.objects.filter(checkinat__location=location).distinct().count()
    

    UPDATE for non-rel support

    checkin_users = [(c.user.pk, c.user) for c in CheckinAct.objects.filter(location=location)]
    unique_checkins = len(dict(checkin_users))
    

    This works off the principle that dicts have unique keys. So when you convert the list of tuples to a dict, you end up with a list of unique users. But, this will generate 1*N queries, where N is the total amount of checkins (one query each time the user attribute is used. Normally, I'd do something like .select_related('user'), but that too requires a JOIN, which is apparently out. JOINs not being supported seems like a huge downside to non-rel, if true, but if that's the case this is going to be your only option.