Search code examples
pythondjangodjango-modelsdjango-mptt

How should I structure a tree of multiple model types?


I'm trying to model television shows down to the episode level. Given that each level of the tree (network, series, season, episode) has different fields, I want to use different model types for each level.

My initial approach was to keep track of the parent with a foreign key at each level (this is a simplified approach, I know there would be other fields):

class Network(models.Model):
   ...

class Series(models.Model):
   network = models.ForeignKey(Network)
   ...

class Season(models.Model):
   series = models.ForeignKey(Series)
   ...

class Episode(models.Model):
   season = models.ForeignKey(Season)
   ...

But if I want to get the network of a specific episode, I have to lookup Episode->Season->Series->Network. That seems inefficient and poorly architected because it requires a lot of queries.

I saw the library django-mptt, but this requires that your tree be built of a single model type.

From a design standpoint, what's the standard way to structure this type of tree? What are the tradeoffs of different approaches?


Solution

  • It is not so inefficient. It requires "only" three joins to get the network for a certain episode.

    You can make your life easier if you create a cached_property on your Episode model:

    class Network(models.Model):
        name = models.CharField(max_length=255)
    
    # ...
    
    class Episode(models.Model):
        season = models.ForeignKey(Season, on_delete=models.CASCADE)
    
        @cached_property
        def network(self):
            return self.season.series.network
    
        @cached_property
        def network_name(self):
            return self.season.series.network.name
    

    This will be expensive to use if you do not annotate that value before accessing it, but it will always work, even if you forget to do that.

    The good thing about cached_property is that it can be overridden by setting that attribute on the instance which is exactly what django does when we annotate a value:

    episodes = Episode.objects.annotate(network_name=F('season__series__network__name'))
    for episode in episodes:
       print(episode.pk, episode.network_name)
    

    By annotating the network name before accessing the episode django will know that is has to join that name. This is how the query looks like:

    SELECT 
        "main_episode"."id", 
        "main_episode"."name", 
        "main_episode"."season_id", 
        "main_network"."name" AS "network_name" 
    FROM "main_episode" 
    INNER JOIN "main_season" ON ("main_episode"."season_id" = "main_season"."id") 
    INNER JOIN "main_series" ON ("main_season"."series_id" = "main_series"."id") 
    INNER JOIN "main_network" ON ("main_series"."network_id" = "main_network"."id")
    

    You can see that it joined the network onto the episode in advance. So this is one query with three joins. Joins do have a cost but you should not worry about that until you experience performance issues.

    Live example