In my app, I can describe an Entity using different Protocols, with each Protocol being a collection of various Traits, and each Trait allows two or more Classes. So, a Description is a collection of Expressions. E.g., I want to describe an entity "John" with the Protocol "X" that comprises the following two Traits and Classes:
Protocol ABC
Trait 1: Height
Available Classes: a. Short b. Medium c. Tall
Trait 2: Weight
Available Classes: a. Light b. Medium c. Heavy
John's Description: Expression 1: c. Tall, Expression 2: b. Medium
My model specification (barebone essentials for simplicity):
class Protocol(models.Model):
"""
A Protocol is a collection of Traits
"""
name = models.CharField()
class Trait(models.Model):
"""
Stores the Traits. Each Trait can have multiple Classes
"""
name = models.CharField()
protocol = models.ForeignKey(
Protocol,
help_text="The reference protocol of the trait",
)
class Class(models.Model):
"""
Stores the different Classes related to a Trait.
"""
name = models.CharField()
trait = models.ForeignKey(Trait)
class Description(models.Model):
"""
Stores the Descriptions. A description is a collection of Expressions.
"""
name = models.CharField()
protocol = models.ForeignKey(
Protocol,
help_text="reference to the protocol used to make the description;\
this will define which Traits will be available",
)
entity = models.ForeignKey(
Entity,
help_text="the Entity to which the description refers to",
)
class Expression(models.Model):
"""
Stores the expressions of entities related to a specific
Description. It refers to one particular Class (which is
then associated with a specific Trait)
"""
class = models.ForeignKey(Class)
description = models.ForeignKey(Description)
Following the previous example, let's say I want to find all the Entities that are medium or tall (Trait 1) and heavy (Trait 2). The query I'm now using is the following:
# This is the filter returned by the HTML form, which list
# all the available Classes for each Trait of the selected Protocol
filters = [
{'trait': 1, 'class': [2, 3]},
{'trait': 2, 'class': [6,]},
]
queryset = Description.objects.all()
for filter in filters:
queryset = queryset.filter(expression_set__class__in=filter["class"])
The problem is that the query is slow (I have ATM ~1000 Descriptions, described with
a Protocol of 40 Traits, each Trait having 2 to 5 Classes). It takes about two
seconds to return the results even when filtering by only 5-6 Expressions.
I tried using prefetch_related("expression_set")
or
prefetch_related("expression_set__class")
but with no significant improvement.
The question is: can you suggest a way to improve the performance, or this is simply the reality of searching through so many tables?
Thank you very much for your time.
EDIT: The following is the query generated by the Manager when, e.g., eight filters (see previous code snippet) are applied.
SELECT "describe_description"."id",
"describe_description"."name",
"describe_description"."protocol_id",
FROM "describe_description"
INNER JOIN "describe_expression"
ON ("describe_description"."id" = "describe_expression"."description_id")
INNER JOIN "describe_expression" T4
ON ("describe_description"."id" = T4."description_id")
INNER JOIN "describe_expression" T6
ON ("describe_description"."id" = T6."description_id")
INNER JOIN "describe_expression" T8
ON ("describe_description"."id" = T8."description_id")
INNER JOIN "describe_expression" T10
ON ("describe_description"."id" = T10."description_id")
INNER JOIN "describe_expression" T12
ON ("describe_description"."id" = T12."description_id")
INNER JOIN "describe_expression" T14
ON ("describe_description"."id" = T14."description_id")
INNER JOIN "describe_expression" T16
ON ("describe_description"."id" = T16."description_id")
INNER JOIN "describe_expression" T18
ON ("describe_description"."id" = T18."description_id")
WHERE ("describe_expression"."class_id" IN (732) AND T4."class_id" IN (740) AND T6."class_id" IN (760) AND T8."class_id" IN (783) AND T10."class_id" IN (794) AND T12."class_id" IN (851) AND T14."class_id" IN (857) AND T16."class_id" IN (860) AND T18."class_id" IN (874))
First you should avoid multiple joins by aggregating desired filters upfront:
filters = [
{'trait': 1, 'class': [2, 3]},
{'trait': 2, 'class': [6,]},
]
queryset = Description.objects.all()
class_filter = []
for filter_entry in filters:
class_filter.append(filter_entry["class"])
queryset = queryset.filter(expression_set__class__in=class_filter)
Second problem is scanning for text values. Use db_index=True
on your Class.name field.
EDIT: There is a difference in chaining the filters on the same table and using Q objects. It does not act like AND on the same object. It seems counterintuitive as in the sql you can see AND but that is AND on multiple joins, where each join effectively duplicates the descriptions (that is why it get's slow). Best explained in the Django docs or this article.
Quick excerpt of the docs:
To select all blogs containing at least one entry from 2008 having “Lennon” in its headline (the same entry satisfying both conditions), we would write:
Blog.objects.filter(entry__headline__contains='Lennon', entry__pub_date__year=2008)
Otherwise, to perform a more permissive query selecting any blogs with merely some entry with “Lennon” in its headline and some entry from 2008, we would write:
Blog.objects.filter(entry__headline__contains='Lennon').filter(entry__pub_date__year=2008)
EDIT 2: Schematic example of above from this answer:
Blog.objects.filter(entry__headline_contains='Lennon',
entry__pub_date__year=2008)
filters only Blog 1
Blog.objects.filter(entry__headline_contains='Lennon').filter(
entry__pub_date__year=2008)
filters blog 1 and 2