Search code examples
pythondjangopython-itertools

Nested GROUP BY in django: returning Objects


I'm trying to use a classic database GROUP_BY on my queryset in Django.

I've already seen How to query as GROUP BY in django? but the distinction is :

  • that I need to get a dictionnary of objects, not values (indeed not the type of thing returned by values()).
  • that I need to get a nested dictionary (i.e, I need to GROUP_BY on GROUP_BY).

My model:

class CitiesTable(models.Model):
    country = models.TextField() 
    region_or_state = models.TextField() 
    city = models.TextField() 

I want to classify by country, then, among them, by region_or_state where inside there's the list of the city:

I want to get

{'USA':
    {'California': ['San Francisco', 'Los Angeles', ...],
     'Texas': ['Dallas', 'Houston', ...},
 'Germany':
     {'Bavaria': ['Munich', 'Nuremberg', ...}
}

But here, as noted previously, the cities inside the list should be Objects.

I didn't find how to get it in Django so, using itertools in python, I've already succeeded for the outer GROUP_BY: (helping me with this : How to convert tuple to a multi nested dictionary in python?):

from itertools import groupby

def show_cities(request):
    queryset = CitiesTable.objects.all()
    grouped_cities = []
    unique_countries = []
    for k, g in groupby(queryset, lambda x: x.country):
        grouped_cities.append(list(g))
        unique_countries.append(k)
    return render(request, 'cities/show_cities.html', 
                  {'cities':grouped_cities, 'countries':unique_countries})

But I didn't manage to group all the cities objects in their region_or_state.


Solution

  • I'm afraid I don't know how to solve this problem with django's querysets. There may not be a solution, or if there is it probably involves using undocumented features.

    However, the python solution would do fine in most cases. Django would probably have to do something similar itself anyway.

    It looks like there may be a slight issue with your current code. You should almost always sort your entries before using itertools.groupby - it's a common trap, so much so that I'm surprised it doesn't sort automatically. Groupby iterates through the iterable, and creates a new group every time the value of the key changes. So to fully group an iterable, you need to sort first by the key.

    Here you could do the sorting using the database, and simply use groupby twice:

    queryset = CitiesTable.objects.order_by('country', 'region_or_state')
    
    countries_dict = {}
    
    for country, group in groupby(queryset, lambda x: x.country):
        countries_dict[country] = {}
        for region, inner_group in groupby(group, lambda x: x.region_or_state):
            countries_dict[country][region] = list(inner_group)