Search code examples
pythondjangoanalyticsdashboard

counting distinct values within a distinct value search on another field within same jango model


I am trying to pull some analytics from my django model table. So far I can count total values of a field and distinct values of a field. I also know how to create lists showing total values of fields within a distinct field. Now i'd like to count the distinct instances a field occurs within a list of already distinct values of a different field. Here's the table I am working with:

| uid   |  cid   |
|-------|--------|
| a     | apple  |
| a     | apple  |
| a     | grape  |
| b     | apple  |
| b     | grape  |
| c     | apple  |
| c     | pear   |
| c     | pear   |
| c     | pear   |

So the result I am trying to provide is:

cid: apple (distinct uid count: 3),
cid: grape (distinct uid count: 2),
cid: pear (distinct uid count: 1)

and also:

cid apple's distinct uid's: a, b, c
cid grape's distinct uid's: a, b
cid pear's distinct uid's: c

So far I have been able to get distinct counts and lists like this:

dist_uid_list = Fruit.objects.filter(client=user).values('uid').distinct()
output >>> {'uid': 'a', 'uid': 'b', 'uid': 'c'}

and this:

dist_uid_count = Fruit.objects.filter(client=user).values('uid').distinct().count()
output >>> {3}

and more complex:

total_actions_per_cid = Fruit.objects\
            .filter(client=user)\
            .values('cid').distinct()\
            .annotate(num_actions=Count('action_name'))\
            .order_by('cid')
output >>> {'cid': 'apple', 'num_actions': '4'}{'cid': 'grape', 'num_actions': '2'}{'cid': 'pear', 'num_actions': '3'}

So here is the question: how could I go in and take each distinct 'cid' and find a count of how many distinct 'uid's exist within each?

Here are all the django files that might be helpful to see:

models.py

class Fruit(models.Model):
    uid = models.CharField(max_length=50, blank=True)
    cid = models.CharField(max_length=50, blank=True)
    record_date = models.DateTimeField(auto_now_add=True)
    client = models.CharField(max_length=50, blank=True)
    action_name = models.CharField(max_length=50, blank=True)

views.py

class DashboardListView(LoginRequiredMixin, ListView):
    model = Fruit
    template_name = 'blog/dashboard.html'
    context_object_name = 'fruit'
    ordering = ['-record_date']
    
    def get_context_data(self, **kwargs):
        user = get_object_or_404(User, username=self.kwargs.get('username'))
        context = super().get_context_data(**kwargs)
        dist_uid_list = Fruit.objects.filter(client=user).values('uid').distinct()
        dist_uid_count = Fruit.objects.filter(client=user).values('uid').distinct().count()
        total_actions_per_cid = Fruit.objects\
            .filter(client=user)\
            .values('cid').distinct()\
            .annotate(num_actions=Count('action_name'))\
            .order_by('cid')
        context['dist_uid_list'] = dist_uid_list
        context['dist_uid_count'] = dist_uid_count
        context['total_actions_per_cid'] = total_actions_per_cid

html outputs

{% for user in dist_uid_list %}
    {{ user.uid }}
{% endfor %}

    {{ dist_uid_count }}

{% for action in total_actions_per_cid %}
    {{ num_actions }}
{% endfor %}

I assume there needs to be some sort of forloop action and multiple defs involved in views to make this work. I just cant quite figure out how I should go about doing that.


Solution

  • The Count aggregate has a distinct parameter that may help:

    >>> q = Book.objects.annotate(Count('authors', distinct=True), Count('store', distinct=True))
    

    https://docs.djangoproject.com/en/3.1/topics/db/aggregation/#combining-multiple-aggregations

    Thus your query would look like:

    # I removed the distinct after .values, as the values works
    # like a GROUP BY, thus you will get already unique 'cid's
    total_actions_per_cid = Fruit.objects\
                .filter(client=user)\
                .values('cid') \
                .annotate(num_uids=Count('uid', distinct=True))