I'm using the HyperLogLog (hll) field to represent unique users, using the Django django-pg-hll
package. What I'd like to do is get a cumulative total of unique users over a specific time period, but I'm having trouble doing this.
Given a model like:
class DailyUsers(model.Model):
date = models.DateField()
users = HllField()
I can get the cummulative HllField
for each day like so:
queryset = models.DailyUsers.objects.annotate(
cumulative_hll_users=Window(
UnionAgg("users"), order_by=F('date').asc()
)
)
However, when I try and get the cardinality
(the actual number) like so:
queryset = queryset.annotate(
cumsum=Cardinality("cumulative_hll_users")
)
The following error occurs:
django.db.utils.ProgrammingError: OVER specified, but hll_cardinality is not a window function nor an aggregate function
LINE 1: SELECT "app_dailyusers"."date", hll_cardinality...
Which is odd because Cardinality
is defined as an aggregate function. I'm not sure if there's a way around this, I imagine it might be possible to do this in raw sql, but I haven't made much progress.
A solution in either the Django ORM or raw SQL would be greatly appreciated.
This bug occurs because the django-pg-hll
pacakage uses the hll_cardinality
function instead of the #
operator for window functions. Moving to a raw
sql solution fixed the issue.