I experience very weird behaviour of prefetch_related
call. Here is the illustration:
# First define two sketch models, just for convenience of the further talk.
class Secondary(models.Model):
pass
class Primary(models.Model):
secondaries = models.ManyToManyField(Secondary)
# Just to make clear, EVERY Primary object in my system has at least one
# related Secondary object.
# Now prepare a query.
primaries = Primary.objects.filter(...)\
.order_by(...)\
.prefetch_related('secondary')
# Iterating:
for primary in primaries:
if not primary.secondaries.all():
# So we have found an object that is said to not have
# any relatives. Re-query this particular object.
# This part is hit in my code, although it should not.
primary = Primary.objects.get(pk=primary.pk)
for secondary in primary.secondaries.all():
# Voila, there are relatives!
# This part was not hit for some objects until I added
# the re-query part above.
pass
Just to make clear, there are no Primary
objects in my system that do not have related Secondary
objects, but still the code above hits the re-query part for some of them (always the same ones), and re-querying fetches the secondaries that were missing. And what is even more weird, I can see that some Primaries get repeating Secondaries in their secondaries.all()
—the overall impression is that ORM connects some sets of Secondaries to wrong Primaries by mistake.
What could be wrong? Is that some bug of Django, or of the database?
I use Django 1.10.5, psycopg2 2.7.3, and Postgres 9.6.
Update: I have found that the issue is even worse: sometimes the ORM returns incomplete lists of related objects, so my workaround explained above doesn't help. We had to remove the prefetch_related call because obviously we cannot rely on data that it returns.
Update 2: as Daniel asked in comments, here are some real SQL queries (although not from the system where we experienced the problem). backend_build
is the "primary" model, and there are several "secondary" models: backend_buildproblem
, backend_sanityproblem
, and backend_runproblem
—we use django_polymorphic for them, the base model is backend_problem
.
The Python code looks like this:
builds = Build.objects.filter(
branch__active=True,
type__active=True,
finish_timestamp__gt=timezone.now() - timedelta(days=10))\
.order_by('-finish_timestamp')\
.prefetch_related('problems')
for build in builds:
for problem in build.problems.all():
print problem.id # just a stub code to use results of the query.
And here are the resulting SQL queries:
SELECT "backend_build"."teamcity_id", "backend_build"."status", "backend_build"."finish_timestamp", "backend_build"."type_id", "backend_build"."branch_id", "backend_build"."revision"
FROM "backend_build"
INNER JOIN "backend_buildtype" ON ("backend_build"."type_id" = "backend_buildtype"."id")
INNER JOIN "backend_branch" ON ("backend_build"."branch_id" = "backend_branch"."id")
WHERE ("backend_build"."finish_timestamp" > \'2017-08-18T06:35:21.322000+00:00\'::timestamptz AND "backend_buildtype"."active" = true AND "backend_branch"."active" = true)
ORDER BY "backend_build"."finish_timestamp" DESC
SELECT "backend_problem"."id", "backend_problem"."polymorphic_ctype_id", "backend_problem"."generic_type", "backend_problem"."startrack_id", "backend_problem"."useful", "backend_problem"."status", "backend_problem"."summary"
FROM "backend_problem"
INNER JOIN "backend_build_problems" ON ("backend_problem"."id" = "backend_build_problems"."problem_id")
WHERE "backend_build_problems"."build_id" = 18984809
SELECT "backend_problem"."id", "backend_problem"."polymorphic_ctype_id", "backend_problem"."generic_type", "backend_problem"."startrack_id", "backend_problem"."useful", "backend_problem"."status", "backend_problem"."summary", "backend_sanityproblem"."problem_ptr_id", "backend_sanityproblem"."code", "backend_sanityproblem"."latest_occurred"
FROM "backend_sanityproblem"
INNER JOIN "backend_problem" ON ("backend_sanityproblem"."problem_ptr_id" = "backend_problem"."id")
WHERE "backend_sanityproblem"."problem_ptr_id" IN (9251, 9252, 9253, 9254, 9255, 9256, 9257, 9259, 9261, 9262, 9263, 9264, 9268, 9269, 9270, 9271, 9272, 9273, 9274, 9275, 9276, 9277, 9280, 9283, 9285, 9287, 9290, 9293, 9294, 9295, 9297, 9302, 9303, 9304, 9306, 9307, 9309, 9312, 9313, 9314, 9316, 9317, 9319, 9321, 9322, 9062, 9063, 9066, 9068, 9092, 9107, 9109, 9112, 9648, 9649, 9650, 9651, 9652, 9653, 9654, 9655, 9656, 9657, 9658, 9659, 9660, 9661, 9662, 9663, 9664, 9665, 9666, 9667, 9668, 9669, 9670, 9671, 9672, 9673, 9674, 9675, 9676, 9677, 9678, 9679, 9680, 9681, 9682, 9683, 9684, 9685, 9686, 9687, 9688, 9689, 9690, 9691, 9692, 9693, 9694)
SELECT "backend_problem"."id", "backend_problem"."polymorphic_ctype_id", "backend_problem"."generic_type", "backend_problem"."startrack_id", "backend_problem"."useful", "backend_problem"."status", "backend_problem"."summary", "backend_sanityproblem"."problem_ptr_id", "backend_sanityproblem"."code", "backend_sanityproblem"."latest_occurred"
FROM "backend_sanityproblem"
INNER JOIN "backend_problem" ON ("backend_sanityproblem"."problem_ptr_id" = "backend_problem"."id")
WHERE "backend_sanityproblem"."problem_ptr_id" IN (9344, 9345, 9488, 9489, 9508, 9509, 9510, 9511, 9512, 9513, 9399, 9401, 9402, 9403, 9426, 9436, 9572, 9573, 9574, 9575, 9330, 9337, 9338, 9339, 9340, 9341, 9342)
SELECT "backend_problem"."id", "backend_problem"."polymorphic_ctype_id", "backend_problem"."generic_type", "backend_problem"."startrack_id", "backend_problem"."useful", "backend_problem"."status", "backend_problem"."summary"
FROM "backend_problem"
INNER JOIN "backend_build_problems" ON ("backend_problem"."id" = "backend_build_problems"."problem_id")
WHERE "backend_build_problems"."build_id" = 18944441
SELECT "backend_problem"."id", "backend_problem"."polymorphic_ctype_id", "backend_problem"."generic_type", "backend_problem"."startrack_id", "backend_problem"."useful", "backend_problem"."status", "backend_problem"."summary", "backend_buildproblem"."problem_ptr_id", "backend_buildproblem"."stage"
FROM "backend_buildproblem"
INNER JOIN "backend_problem" ON ("backend_buildproblem"."problem_ptr_id" = "backend_problem"."id")
WHERE "backend_buildproblem"."problem_ptr_id" IN (9600)
SELECT "backend_problem"."id", "backend_problem"."polymorphic_ctype_id", "backend_problem"."generic_type", "backend_problem"."startrack_id", "backend_problem"."useful", "backend_problem"."status", "backend_problem"."summary"
FROM "backend_problem"
INNER JOIN "backend_build_problems" ON ("backend_problem"."id" = "backend_build_problems"."problem_id")
WHERE "backend_build_problems"."build_id" = 18944330
There are more queries like these, I omit them here. It seems clear from the above that the system ritst queries the primary model, and then requests relations for each primary object, and it takes their polymorphic types into account.
I suspect your issue stems from the fact that you have multiple secondary models with the same base model. There is probably an internal cache that gets overwritten with each query. Try limiting the prefetch_related
statement to the problems
model:
.prefetch_related('problems')
Or maybe it is related to this issue with django-polymorphic?