Search code examples
pythondjangoormleft-join

Django - Left join parent and child tables?


I am trying to do a left join with the least number of database queries.

I have looked into using .select_related, I have tried a raw query and tried to see if it could be done with Q.

I would like to retrieve all States and linked Cities if any are listed.

Models.py

class State(models.Model):
  name = models.CharField(max_length=25)
  abbreviation = models.CharField(max_length=2)

  def __str__(self):
    return str(self.id)


class City(models.Model):
  name = models.CharField(max_length=25)
  population = models.IntegerField()
  state = models.ForeignKey(State, related_name="cities", on_delete=models.CASCADE)

  def __str__(self):
    return str(self.id)

State

ID Name Abbreviation
1 Texas TX
2 California CA
3 Illinois IL

City

ID Name Population State
1 Los Angeles 3769485 2
2 Dallas 1259404 1
3 Houston 2264876 1

Desired result (State columns then City columns)

ID Name Abbreviation ID Name Population State
1 Texas TX 2 Dallas 1259404 1
1 Texas TX 3 Houston 2264876 1
2 California CA 1 Los Angeles 3769485 2
3 Illinois IL

Problem Using .select_related:

cities_states = City.objects.all().select_related('state').order_by('state_id')

Using .select_related does not give me any States (e.g. Illinois) that do not have a City

Problem Using RawQuery:

sql = '''
SELECT S.*, C.* 
FROM "state" S
LEFT JOIN "city" C
ON (S."id" = C."state_id") 
ORDER BY S."id" ASC
'''

cities_states = State.objects.raw(sql)

for obj in cities_states:
  print(obj)

The RawQuery returns the data I need but I don't know how to retrieve the specific field from a specific table. For example: print(obj.name) will give me the State name and print(obj.id) will give me State Id.

How do I pull only a City Name or City Id if both State table and City table each have a "name" and "id" field?

How do I obtain the data with least number of database calls?

How do I differentiate same-name column names (e.g. id, name) of tables when using raw queries?


Solution

  • Your desired result is often not optimal. Indeed, it will here repeat the data for the State every time, making the result from the database much larger, and potentially also increases memory usage by Django a lot.

    To counter that, Django provides .prefetch_related(…) [Django-doc] which will make two queries, the first one for the data for the States, and then fetch the related Citys and do the JOINing in Django/Python.

    We thus can query with:

    states = State.objects.prefetch_related('cities')
    for state in states:
        print(f'for {state}')
        for city in state.cities.all():
            print(f' - {city}')