Search code examples
pythondjangodjango-querysetdjango-ormdjango-annotate

Month on month values in django query


I have an annotation like this: which displays the month wise count of a field

bar = Foo.objects.annotate(
    item_count=Count('item')
).order_by('-item_month', '-item_year')

and this produces output like this: html render

I would like to show the change in item_count when compared with the previous month item_count for each month (except the first month). How could I achieve this using annotations or do I need to use pandas?

Thanks

Edit: In SQL this becomes easy with LAG function, which is similar to

SELECT item_month, item_year, COUNT(item),
LAG(COUNT(item)) OVER (ORDER BY item_month, item_year)
FROM Foo 
GROUP BY item_month, item_year

(PS: item_month and item_year are date fields)

Do Django ORM have similar to LAG in SQL?


Solution

  • For these types of Query you need to use Window functions in django Orm

    For Lag you can take the help of

    https://docs.djangoproject.com/en/4.0/ref/models/database-functions/#lag

    Working Query in Orm will look like this :

    #models.py

    class Review(models.Model):
        user = models.ForeignKey(User, on_delete=models.CASCADE, related_name='review_user', db_index=True)
        review_text = models.TextField(max_length=5000)
        rating = models.SmallIntegerField(
            validators=[
                MaxValueValidator(10),
                MinValueValidator(1),
            ],
        )
        date_added = models.DateTimeField(db_index=True)
        review_id = models.AutoField(primary_key=True, db_index=True)
    

    This is just a dummy table to show you the use case of Lag and Window function in django Because examples are not available for Lag function on Django Docs.

    from django.db.models.functions import Lag, ExtractYear
    from django.db.models import F, Window
    
    print(Review.objects.filter().annotate(
            num_likes=Count('likereview_review')
        ).annotate(item_count_lag=Window(expression=Lag(expression=F('num_likes')),order_by=ExtractYear('date_added').asc())).order_by('-num_likes').distinct().query)
    

    Query will look like

    SELECT DISTINCT `temp_view_review`.`user_id`, `temp_view_review`.`review_text`, `temp_view_review`.`rating`, `temp_view_review`.`date_added`, `temp_view_review`.`review_id`, COUNT(`temp_view_likereview`.`id`) AS `num_likes`, LAG(COUNT(`temp_view_likereview`.`id`), 1) OVER (ORDER BY EXTRACT(YEAR FROM `temp_view_review`.`date_added`) ASC) AS `item_count_lag` FROM `temp_view_review` LEFT OUTER JOIN `temp_view_likereview` ON (`temp_view_review`.`review_id` = `temp_view_likereview`.`review_id`) GROUP BY `temp_view_review`.`review_id` ORDER BY `num_likes` DESC
    

    Also if you don't want to order_by on extracted year of date then you can use F expressions like this

    print(Review.objects.filter().annotate(
            num_likes=Count('likereview_review')
        ).annotate(item_count_lag=Window(expression=Lag(expression=F('num_likes')),order_by=[F('date_added')])).order_by('-num_likes').distinct().query)
    

    Query for this :

    SELECT DISTINCT `temp_view_review`.`user_id`, `temp_view_review`.`review_text`, `temp_view_review`.`rating`, `temp_view_review`.`date_added`, `temp_view_review`.`review_id`, COUNT(`temp_view_likereview`.`id`) AS `num_likes`, LAG(COUNT(`temp_view_likereview`.`id`), 1) OVER (ORDER BY `temp_view_review`.`date_added`) AS `item_count_lag` FROM `temp_view_review` LEFT OUTER JOIN `temp_view_likereview` ON (`temp_view_review`.`review_id` = `temp_view_likereview`.`review_id`) GROUP BY `temp_view_review`.`review_id` ORDER BY `num_likes` DESC