Search code examples
google-app-enginedjango-nonreldjango-select-related

How can I mimic 'select_related' using google-appengine and django-nonrel?


django nonrel's documentation states: "you have to manually write code for merging the results of multiple queries (JOINs, select_related(), etc.)".

Can someone point me to any snippets that manually add the related data? @nickjohnson has an excellent post showing how to do this with the straight AppEngine models, but I'm using django-nonrel.

For my particular use I'm trying to get the UserProfiles with their related User models. This should be just two simple queries, then match the data.

However, using django-nonrel, a new query gets fired off for each result in the queryset. How can I get access to the related items in a 'select_related' sort of way?

I've tried this, but it doesn't seem to work as I'd expect. Looking at the rpc stats, it still seems to be firing a query for each item displayed.

all_profiles = UserProfile.objects.all()
user_pks = set()
for profile in all_profiles: 
    user_pks.add(profile.user_id)  # a way to access the pk without triggering the query

users = User.objects.filter(pk__in=user_pks)
for profile in all_profiles:
    profile.user = get_matching_model(profile.user_id, users)


def get_matching_model(key, queryset):
    """Generator expression to get the next match for a given key"""
    try:
        return (model for model in queryset if model.pk == key).next()
    except StopIteration:
        return None

UPDATE: Ick... I figured out what my issue was.

I was trying to improve the efficiency of the changelist_view in the django admin. It seemed that the select_related logic above was still producing additional queries for each row in the results set when a foreign key was in my 'display_list'. However, I traced it down to something different. The above logic does not produce multiple queries (but if you more closely mimic Nick Johnson's way it will look a lot prettier).

The issue is that in django.contrib.admin.views.main on line 117 inside the ChangeList method there is the following code: result_list = self.query_set._clone(). So, even though I was properly overriding the queryset in the admin and selecting the related stuff, this method was triggering a clone of the queryset which does NOT keep the attributes on the model that I had added for my 'select related', resulting in an even more inefficient page load than when I started.

Not sure what to do about it yet, but the code that selects related stuff is just fine.


Solution

  • I don't like answering my own question, but the answer might help others.

    Here is my solution that will get related items on a queryset based entirely on Nick Johnson's solution linked above.

    
    from collections import defaultdict
    
    def get_with_related(queryset, *attrs):
        """
        Adds related attributes to a queryset in a more efficient way
        than simply triggering the new query on access at runtime.
    
        attrs must be valid either foreign keys or one to one fields on the queryset model
        """
        # Makes a list of the entity and related attribute to grab for all possibilities
        fields = [(model, attr) for model in queryset for attr in attrs]
    
        # we'll need to make one query for each related attribute because
        # I don't know how to get everything at once. So, we make a list
        # of the attribute to fetch and pks to fetch.
        ref_keys = defaultdict(list)
        for model, attr in fields:
            ref_keys[attr].append(get_value_for_datastore(model, attr))
    
        # now make the actual queries for each attribute and store the results
        # in a dict of {pk: model} for easy matching later
        ref_models = {}
        for attr, pk_vals in ref_keys.items():
            related_queryset = queryset.model._meta.get_field(attr).rel.to.objects.filter(pk__in=set(pk_vals))
            ref_models[attr] = dict((x.pk, x) for x in related_queryset)
    
        # Finally put related items on their models
        for model, attr in fields:
            setattr(model, attr, ref_models[attr].get(get_value_for_datastore(model, attr)))
    
        return queryset
    
    def get_value_for_datastore(model, attr):
        """
        Django's foreign key fields all have attributes 'field_id' where
        you can access the pk of the related field without grabbing the
        actual value.
        """
        return getattr(model, attr + '_id')
    

    To be able to modify the queryset on the admin to make use of the select related we have to jump through a couple hoops. Here is what I've done. The only thing changed on the 'get_results' method of the 'AppEngineRelatedChangeList' is that I removed the self.query_set._clone() and just used self.query_set instead.

    
    class UserProfileAdmin(admin.ModelAdmin):
        list_display = ('username', 'user', 'paid')
        select_related_fields = ['user']
    
        def get_changelist(self, request, **kwargs):
            return AppEngineRelatedChangeList
    
    class AppEngineRelatedChangeList(ChangeList):
    
        def get_query_set(self):
            qs = super(AppEngineRelatedChangeList, self).get_query_set()
            related_fields = getattr(self.model_admin, 'select_related_fields', [])
            return get_with_related(qs, *related_fields)
    
        def get_results(self, request):
            paginator = self.model_admin.get_paginator(request, self.query_set, self.list_per_page)
            # Get the number of objects, with admin filters applied.
            result_count = paginator.count
    
            # Get the total number of objects, with no admin filters applied.
            # Perform a slight optimization: Check to see whether any filters were
            # given. If not, use paginator.hits to calculate the number of objects,
            # because we've already done paginator.hits and the value is cached.
            if not self.query_set.query.where:
                full_result_count = result_count
            else:
                full_result_count = self.root_query_set.count()
    
            can_show_all = result_count  self.list_per_page
    
            # Get the list of objects to display on this page.
            if (self.show_all and can_show_all) or not multi_page:
                result_list = self.query_set
            else:
                try:
                    result_list = paginator.page(self.page_num+1).object_list
                except InvalidPage:
                    raise IncorrectLookupParameters
    
            self.result_count = result_count
            self.full_result_count = full_result_count
            self.result_list = result_list
            self.can_show_all = can_show_all
            self.multi_page = multi_page
            self.paginator = paginator