On the one hand, let's consider this Django model:
from django.db import models
from uuid import UUID
class Entry(models.Model):
id = models.UUIDField(primary_key=True, default=uuid4, editable=False)
value = models.DecimalField(decimal_places=12, max_digits=22)
items = ArrayField(base_field=models.UUIDField(null=False, blank=False), default=list)
On the other hand, let's say we have this dictionary:
coefficients = {item1_uuid: item1_coef, item2_uuid: item2_coef, ... }
Entry.value
is intended to be distributed among the Entry.items
according to coefficients
.
Using Django ORM, what would be the most efficient way (in a single SQL query) to get the sum of the values of my Entries
for a single Item
, given the coefficients?
For instance, for item1
below I want to get 168.5454...
, that is to say 100 * 1 + 150 * (0.2 / (0.2 + 0.35)) + 70 * 0.2
.
Entry ID | Value | Items |
---|---|---|
uuid1 | 100 | [item1_uuid] |
uuid2 | 150 | [item1_uuid, item2_uuid] |
uuid3 | 70 | [item1_uuid, item2_uuid, item3_uuid] |
coefficients = { item1_uuid: Decimal("0.2"), item2_uuid: Decimal("0.35"), item3_uuid: Decimal("0.45") }
Bonus question: how could I adapt my models for this query to run faster? I've deliberately chosen to use an ArrayField
and decided not to use a ManyToManyField
, was that a bad idea? How to know where I could add db_index[es]
for this specific query?
I am using Python 3.10, Django 4.1. and Postgres 14.
I've found a solution to my own question, but I'm sure someone here could come up with a more efficient & cleaner approach.
The idea here is to chain the .alias()
methods (cf. Django documentation) and the conditional expressions with Case
and When
in a for
loop.
This results in an overly complex query, which at least does work as expected:
def get_value_for_item(coefficients, item):
item_coef = coefficients.get(item.pk, Decimal(0))
if not item_coef:
return Decimal(0)
several = Q(items__len__gt=1)
queryset = (
Entry.objects
.filter(items__contains=[item.pk])
.alias(total=Case(When(several, then=Value(Decimal(0)))))
)
for k, v in coefficients.items():
has_k = Q(items__contains=[k])
queryset = queryset.alias(total=Case(
When(several & has_k, then=Value(v) + F("total")),
default="total",
)
)
return (
queryset.annotate(
coef_applied=Case(
When(several, then=Value(item_coef) / F("total") * F("value")),
default="value",
)
).aggregate(Sum("coef_applied", default=Decimal(0)))
)["coef_applied__sum"]
With the example I gave in my question and for item1
, the output of this function is Decimal(168.5454...)
as expected.