Search code examples
pythondjangodjango-modelsdjango-querysetdjango-orm

Complex Django query involving an ArrayField & coefficients


On the one hand, let's consider this Django model:

from django.db import models
from uuid import UUID

class Entry(models.Model):
    id = models.UUIDField(primary_key=True, default=uuid4, editable=False)
    value = models.DecimalField(decimal_places=12, max_digits=22)
    items = ArrayField(base_field=models.UUIDField(null=False, blank=False), default=list)

On the other hand, let's say we have this dictionary:

coefficients = {item1_uuid: item1_coef, item2_uuid: item2_coef, ... }

Entry.value is intended to be distributed among the Entry.items according to coefficients.

Using Django ORM, what would be the most efficient way (in a single SQL query) to get the sum of the values of my Entries for a single Item, given the coefficients?

For instance, for item1 below I want to get 168.5454..., that is to say 100 * 1 + 150 * (0.2 / (0.2 + 0.35)) + 70 * 0.2.

Entry ID Value Items
uuid1 100 [item1_uuid]
uuid2 150 [item1_uuid, item2_uuid]
uuid3 70 [item1_uuid, item2_uuid, item3_uuid]
coefficients = { item1_uuid: Decimal("0.2"), item2_uuid: Decimal("0.35"), item3_uuid: Decimal("0.45") }

Bonus question: how could I adapt my models for this query to run faster? I've deliberately chosen to use an ArrayField and decided not to use a ManyToManyField, was that a bad idea? How to know where I could add db_index[es] for this specific query?

I am using Python 3.10, Django 4.1. and Postgres 14.


Solution

  • I've found a solution to my own question, but I'm sure someone here could come up with a more efficient & cleaner approach.

    The idea here is to chain the .alias() methods (cf. Django documentation) and the conditional expressions with Case and When in a for loop.

    This results in an overly complex query, which at least does work as expected:

    def get_value_for_item(coefficients, item):
            item_coef = coefficients.get(item.pk, Decimal(0))
            if not item_coef:
                    return Decimal(0)
            several = Q(items__len__gt=1)
            queryset = (
                    Entry.objects
                    .filter(items__contains=[item.pk])
                    .alias(total=Case(When(several, then=Value(Decimal(0)))))
            )
            for k, v in coefficients.items():
                    has_k = Q(items__contains=[k])
                    queryset = queryset.alias(total=Case(
                            When(several & has_k, then=Value(v) + F("total")),
                            default="total",
                    )
            )
            return (
                    queryset.annotate(
                            coef_applied=Case(
                                    When(several, then=Value(item_coef) / F("total") * F("value")),
                                    default="value",
                            )
                    ).aggregate(Sum("coef_applied", default=Decimal(0)))
            )["coef_applied__sum"]
    

    With the example I gave in my question and for item1, the output of this function is Decimal(168.5454...) as expected.