Search code examples
mysqldjangofrequencyannotate

Django annotate with frequency


(django 3.2.12, python 3.9.3, MySQL 8.0.28)

Imagine models like the following:

class User(models.Model):
    email = models.EmailField(...)
    created_datetime = models.DatetimeField(...)

class UserLog(models.Model):
    created_datetime = models.DatetimeField(...)
    user = models.ForeignKey('user.User' ...)
    login = models.BooleanField('Log in' ..)

And the following query, destined to annotate each user in the queryset with the frequency of their logs(when log.login=True):

users = User.objects.filter(
    Q(...)
).annotate(
    login_count=Count('userlog', filter=Q(userlog__login=True)),
    login_duration_over=Now() - F('created_datetime'),
    login_frequency=ExpressionWrapper(
        F('login_duration_over') / F('login_count'),
        output_field=models.DurationField()
    ),
)

This results in a SQL error:

(1064, "You have an error in your SQL syntax;)

The generated SQL (fragment for login_frequency) looks like this:

(
  INTERVAL TIMESTAMPDIFF(
    MICROSECOND,
    `user_user`.`created_datetime`,
    CURRENT_TIMESTAMP
  ) MICROSECOND / (
    COUNT(
        CASE WHEN `user_userlog`.`login` THEN `user_userlog`.`id` ELSE NULL END
    )
  )
) AS `login_frequency`,

and MySQL does not seem to like it. A similar code works on SQLlite and, I am told on PG.

What is wrong with the ExpressionWrapper on MySQL, any idea?


Solution

  • Found a workaround:

    users = User.objects.filter(
        Q(...)
    ).annotate(
        login_count=Count('userlog', filter=Q(userlog__login=True)),
        login_duration_over=Now() - F('created_datetime'),
        login_frequency=Cast(
            ExpressionWrapper(
                Cast(F('login_duration_over'), output_field=models.BigIntegerField()) / F('login_count'),
                output_field=models.BigIntegerField()
            ),
            output_field=models.DurationField()
        )
    )
    

    this forces the DIVIDE operation to be performed db-side on bigints and once that is done, cast it back to a timedelta.

    MySQL stopped screaming and the results are correct.

    Even though that work, this feels ugly. Could there not be a better way?