Search code examples
pythondjangodjango-querysetdjango-q

Django: Building dynamic Q queries for related tables


[EDIT]

I have created an example Django Repl.it playground preloaded with this exact case:

https://repl.it/@mormoran/Django-Building-dynamic-Q-queries-for-related-tables

[/EDIT]

I'm trying to filter objects on a table, based on related objects, but having trouble doing so.

I have a table Run:

class Run(models.Model):
    start_time = models.DateTimeField(db_index=True)
    end_time = models.DateTimeField()

Each Run object has related table RunValue:

class RunValue(models.Model):
    run = models.ForeignKey(Run, on_delete=models.CASCADE)
    run_parameter = models.CharField(max_length=50)
    value = models.FloatField(default=0)

In RunValue we store detailed characteristics of a run, called a run_parameter. Things such as, voltage, temperature, pressure, etc.

For simplicity's sake, let's assume the fields I want to filter on are "Min. Temperature" and "Max. Temperature".

So for example:

Run 1:
    Run Values:
        run_parameter: "Min. Temperature", value: 430
        run_parameter: "Max. Temperature", value: 436

Run 2:
    Run Values:
        run_parameter: "Min. Temperature", value: 627
        run_parameter: "Max. Temperature", value: 671

Run 3:
    Run Values:
        run_parameter: "Min. Temperature", value: 642
        run_parameter: "Max. Temperature", value: 694

Run 4:
    Run Values:
        run_parameter: "Min. Temperature", value: 412
        run_parameter: "Max. Temperature", value: 534

(RunValue.value are floats, but let's keep it to ints for simplicity).

I have two HTML inputs in my page where the user enters min and max (for temperatures). They can both be left blank, or just one, or both, so it's an open ended filter, it can define a range to filter on, or not. For example, if the user were to input:

Min. temperature = 400
Max. temperature = 500

That set of filters should only return Run 1 from the above Run instance examples, where the lower threshold is above 400 and the upper threshold is below 500. All other Run do not qualify.

So then I need to return all Run object instances where RunValue matches the filters the user has input.

This is what I have tried:

# Grabbing temp ranges from request and setting default filter mins and maxs:
temp_ranges = [0, 999999] # Defaults in case the user does not set anything

if min_temp_filter:
    temp_ranges = [min_temp_filter, 999999]

if max_temp_filter:
    temp_ranges = [0, max_temp_filter]

if min_temp_filter and max_temp_filter:
    temp_ranges = [min_temp_filter, max_temp_filter]

# Starting Q queries
temp_q_queries = [
    Q(runvalue__run_parameter__icontains='Min. Temperature'),
    Q(runvalue__run_parameter__icontains='Max. Temperature')
]

queryset = models.Q(reduce(operator.or_, temp_q_queries), runvalue__value__range=temp_ranges)
filtered_run_instances = Run.objects.filter(queryset)

Running that yields some results, but not the desired results. It returns Run 1 and Run 4, when it should only return Run 1.

The temp_ranges being from 400 to 500, Run 1 qualifies, but the max temperature of Run 4 is over 500, it should NOT qualify. The filters need to exclude object instances by looking at both ranges at the same time, the minimum AND the maximum.

The printed query is as follows:

(AND: (OR: ('runvalue__run_parameter__icontains', 'Min. Temperaure'), ('runvalue__run_parameter__icontains', 'Max. Temperature')), ('runvalue__value__range', ['400', '500']))

What I think I need to be filtering in pseudo code:

All Runs that have RunValue instances where the RunValue.run_parameter is either "Min. Temperature" OR "Max. Temperature" AND the RunValue.value are between 400 and 500.

I then thought I should be including the value ranges in the Q queries as a regular Django filter, separated by commas:

temp_q_queries = [
    Q(runvalue__run_parameter__icontains='Min. Temperature', runvalue__value__range=temp_ranges),
    Q(runvalue__run_parameter__icontains='Max. Temperature', runvalue__value__range=temp_ranges)
]

queryset = models.Q(reduce(operator.or_, temp_q_queries))
filtered_run_instances = Run.objects.filter(queryset)

Same result, so the value range is not the issue there, it's the logic grouping (I think?).

So I tried to do two reduce Q queries (a bit gnarly looking), so as to say:

All Runs that have RunValue instances where the name is "Min. Temperature" AND the values are higher than 400, AND all Runs that have RunValue instances where the name is "Max. Temperature" AND the values are lower than 500

temp_q_queries = [
    models.Q(reduce(operator.and_, [Q(runvalue__run_parameter__icontains='Min. Temperature'), Q(runvalue__value__gte=temp_ranges[0])]),
    models.Q(reduce(operator.and_, [Q(runvalue__run_parameter__icontains='Max. Temperature'), Q(runvalue__value__lte=temp_ranges[1])]))
]

queryset = models.Q(reduce(operator.and_, temp_q_queries))
filtered_run_instances = Run.objects.filter(queryset)

(Note all 3 reduce where changed to AND gates)

This yielded 0 hits.

Using the same compound reduce method for temp_q_queries but changing the outter logic gate for queryset to OR yields the same wrong results, Run 1 and Run 4:

queryset = models.Q(reduce(operator.or_, temp_q_queries))
filtered_run_instances = Run.objects.filter(queryset)

Perhaps I am over complicating myself here and there's something very simple I'm not seeing (I've been trying to solve this logic puzzle for 2 days now, getting a bit of tunnel vision. But I'm rather hoping it's solvable, and simple.

Any help or questions would be greatly appreciated.


Solution

  • Your problem was that you need to meet both conditions and they are never valid both on the same row of RunValue related table. You want to select root objects that have a row "Min. Temperature" in that range and similarly also a "Max. Temperature" valid row. You must use subqueries.

    The best is to use Django 3.0 Exists() subquery condition. It can be easily customized for an old Django.

    A concrete example:

    from django.db.models import Exists, OuterRef
    
    queryset = Run.objects.filter(
        Exists(RunValue.objects.filter(
            run=OuterRef('pk'),
            run_parameter='Min. temperature',
            value__gte=400)),
        Exists(RunValue.objects.filter(
            run=OuterRef('pk'),
            run_parameter='Max. temperature',
            value__lte=500)),
    )
    

    The same by a general solution, because you want a dynamic filter:

    filter_data = {
        'Min. temperature': 400,
        'Max. temperature': 500,
    }
    
    param_operators = {
        'Min. Temperature': 'gte',
        'Max. Temperature': 'lte',
        # much more supported parameters... e.g. 'some boolean as 0 or 1': 'eq'.
    }
    
    conditions = []
    for key, value in filter_data.items():
        if value is not None:
            conditions.append(Exists(RunValue.objects.filter(
                run=OuterRef('pk'),
                run_parameter=key,
                **{'value__{}'.format(param_operators[key]): value}
            )))
    queryset = Run.objects.filter(*conditions)
    

    You know that "Min. Temperature" <= "Max. Temperature", but the database optimizer doesn't know it. I optimized it by removing a superfluous condition of the range. It is also better to completely remove a useless condition "Max. Temperature" <= 999999.

    This answer can be easily customized for Django >=1.11 <= 2.2 Exists() condition after you read approximately tens lines of that documentation.

    You don't need a Q() object in this simple case, even if you want to rewrite it by short one-line expressions and add mnemonic temporary variables.


    EDIT The concrete example can be rewritten for Django < 3.0 this way

    queryset = Run.objects.annotate(
        min_temperature_filter=Exists(RunValue.objects.filter(
            run=OuterRef('pk'),
            run_parameter='Min. temperature',
            value__gte=400)),
        max_temperature_filter=Exists(RunValue.objects.filter(
            run=OuterRef('pk'),
            run_parameter='Max. temperature',
            value__lte=500)),
    ).filter(
        min_temperature_filter=True,
        max_temperature_filter=True,
    )