Search code examples
djangodjango-polymorphic

Django ORM messes related objects with prefetch_related


I experience very weird behaviour of prefetch_related call. Here is the illustration:

# First define two sketch models, just for convenience of the further talk.

class Secondary(models.Model):
    pass

class Primary(models.Model):
    secondaries = models.ManyToManyField(Secondary)

# Just to make clear, EVERY Primary object in my system has at least one
# related Secondary object.

# Now prepare a query.

primaries = Primary.objects.filter(...)\
                           .order_by(...)\
                           .prefetch_related('secondary')

# Iterating:

for primary in primaries:
    if not primary.secondaries.all():
        # So we have found an object that is said to not have
        # any relatives.  Re-query this particular object.
        # This part is hit in my code, although it should not.
        primary = Primary.objects.get(pk=primary.pk)
    for secondary in primary.secondaries.all():
        # Voila, there are relatives!
        # This part was not hit for some objects until I added
        # the re-query part above.
        pass

Just to make clear, there are no Primary objects in my system that do not have related Secondary objects, but still the code above hits the re-query part for some of them (always the same ones), and re-querying fetches the secondaries that were missing. And what is even more weird, I can see that some Primaries get repeating Secondaries in their secondaries.all()—the overall impression is that ORM connects some sets of Secondaries to wrong Primaries by mistake.

What could be wrong? Is that some bug of Django, or of the database?

I use Django 1.10.5, psycopg2 2.7.3, and Postgres 9.6.

Update: I have found that the issue is even worse: sometimes the ORM returns incomplete lists of related objects, so my workaround explained above doesn't help. We had to remove the prefetch_related call because obviously we cannot rely on data that it returns.

Update 2: as Daniel asked in comments, here are some real SQL queries (although not from the system where we experienced the problem). backend_build is the "primary" model, and there are several "secondary" models: backend_buildproblem, backend_sanityproblem, and backend_runproblem—we use django_polymorphic for them, the base model is backend_problem.

The Python code looks like this:

builds = Build.objects.filter(
    branch__active=True,
    type__active=True,
    finish_timestamp__gt=timezone.now() - timedelta(days=10))\
 .order_by('-finish_timestamp')\
 .prefetch_related('problems')

for build in builds:
  for problem in build.problems.all():
    print problem.id  # just a stub code to use results of the query.

And here are the resulting SQL queries:

SELECT "backend_build"."teamcity_id", "backend_build"."status", "backend_build"."finish_timestamp", "backend_build"."type_id", "backend_build"."branch_id", "backend_build"."revision"
  FROM "backend_build"
  INNER JOIN "backend_buildtype" ON ("backend_build"."type_id" = "backend_buildtype"."id")
  INNER JOIN "backend_branch" ON ("backend_build"."branch_id" = "backend_branch"."id")
  WHERE ("backend_build"."finish_timestamp" > \'2017-08-18T06:35:21.322000+00:00\'::timestamptz AND "backend_buildtype"."active" = true AND "backend_branch"."active" = true)
  ORDER BY "backend_build"."finish_timestamp" DESC 

SELECT "backend_problem"."id", "backend_problem"."polymorphic_ctype_id", "backend_problem"."generic_type", "backend_problem"."startrack_id", "backend_problem"."useful", "backend_problem"."status", "backend_problem"."summary"
  FROM "backend_problem"
  INNER JOIN "backend_build_problems" ON ("backend_problem"."id" = "backend_build_problems"."problem_id")
  WHERE "backend_build_problems"."build_id" = 18984809

SELECT "backend_problem"."id", "backend_problem"."polymorphic_ctype_id", "backend_problem"."generic_type", "backend_problem"."startrack_id", "backend_problem"."useful", "backend_problem"."status", "backend_problem"."summary", "backend_sanityproblem"."problem_ptr_id", "backend_sanityproblem"."code", "backend_sanityproblem"."latest_occurred"
  FROM "backend_sanityproblem"
  INNER JOIN "backend_problem" ON ("backend_sanityproblem"."problem_ptr_id" = "backend_problem"."id")
  WHERE "backend_sanityproblem"."problem_ptr_id" IN (9251, 9252, 9253, 9254, 9255, 9256, 9257, 9259, 9261, 9262, 9263, 9264, 9268, 9269, 9270, 9271, 9272, 9273, 9274, 9275, 9276, 9277, 9280, 9283, 9285, 9287, 9290, 9293, 9294, 9295, 9297, 9302, 9303, 9304, 9306, 9307, 9309, 9312, 9313, 9314, 9316, 9317, 9319, 9321, 9322, 9062, 9063, 9066, 9068, 9092, 9107, 9109, 9112, 9648, 9649, 9650, 9651, 9652, 9653, 9654, 9655, 9656, 9657, 9658, 9659, 9660, 9661, 9662, 9663, 9664, 9665, 9666, 9667, 9668, 9669, 9670, 9671, 9672, 9673, 9674, 9675, 9676, 9677, 9678, 9679, 9680, 9681, 9682, 9683, 9684, 9685, 9686, 9687, 9688, 9689, 9690, 9691, 9692, 9693, 9694)

SELECT "backend_problem"."id", "backend_problem"."polymorphic_ctype_id", "backend_problem"."generic_type", "backend_problem"."startrack_id", "backend_problem"."useful", "backend_problem"."status", "backend_problem"."summary", "backend_sanityproblem"."problem_ptr_id", "backend_sanityproblem"."code", "backend_sanityproblem"."latest_occurred"
  FROM "backend_sanityproblem"
  INNER JOIN "backend_problem" ON ("backend_sanityproblem"."problem_ptr_id" = "backend_problem"."id")
  WHERE "backend_sanityproblem"."problem_ptr_id" IN (9344, 9345, 9488, 9489, 9508, 9509, 9510, 9511, 9512, 9513, 9399, 9401, 9402, 9403, 9426, 9436, 9572, 9573, 9574, 9575, 9330, 9337, 9338, 9339, 9340, 9341, 9342)

SELECT "backend_problem"."id", "backend_problem"."polymorphic_ctype_id", "backend_problem"."generic_type", "backend_problem"."startrack_id", "backend_problem"."useful", "backend_problem"."status", "backend_problem"."summary"
  FROM "backend_problem"
  INNER JOIN "backend_build_problems" ON ("backend_problem"."id" = "backend_build_problems"."problem_id")
  WHERE "backend_build_problems"."build_id" = 18944441

SELECT "backend_problem"."id", "backend_problem"."polymorphic_ctype_id", "backend_problem"."generic_type", "backend_problem"."startrack_id", "backend_problem"."useful", "backend_problem"."status", "backend_problem"."summary", "backend_buildproblem"."problem_ptr_id", "backend_buildproblem"."stage"
  FROM "backend_buildproblem"
  INNER JOIN "backend_problem" ON ("backend_buildproblem"."problem_ptr_id" = "backend_problem"."id")
  WHERE "backend_buildproblem"."problem_ptr_id" IN (9600)

SELECT "backend_problem"."id", "backend_problem"."polymorphic_ctype_id", "backend_problem"."generic_type", "backend_problem"."startrack_id", "backend_problem"."useful", "backend_problem"."status", "backend_problem"."summary"
  FROM "backend_problem"
  INNER JOIN "backend_build_problems" ON ("backend_problem"."id" = "backend_build_problems"."problem_id")
  WHERE "backend_build_problems"."build_id" = 18944330

There are more queries like these, I omit them here. It seems clear from the above that the system ritst queries the primary model, and then requests relations for each primary object, and it takes their polymorphic types into account.


Solution

  • I suspect your issue stems from the fact that you have multiple secondary models with the same base model. There is probably an internal cache that gets overwritten with each query. Try limiting the prefetch_related statement to the problems model:

    .prefetch_related('problems')
    

    Or maybe it is related to this issue with django-polymorphic?