Search code examples
djangoperformancedjango-modelsdjango-rest-frameworkquery-performance

Optimize Django Rest ORM queries


I a react front-end, django backend (used as REST back). I've inherited the app, and it loads all the user data using many Models and Serializes. It loads very slow. It uses a filter to query for a single member, then passes that to a Serializer:

found_account = Accounts.objects.get(id='customer_id')
AccountDetailsSerializer(member, context={'request': request}).data

Then there are so many various nested Serializers:

AccountDetailsSerializers(serializers.ModelSerializer):
   Invoices = InvoiceSerializer(many=True)
   Orders = OrderSerializer(many=True)
   ....

From looking at the log, looks like the ORM issues so many queries, it's crazy, for some endpoints we end up with like 50 - 60 queries.

  1. Should I attempt to look into using select_related and prefetch or would you skip all of that and just try to write one sql query to do multiple joins and fetch all the data at once as json?

  2. How can I define the prefetch / select_related when I pass in a single object (result of get), and not a queryset to the serializer?

  3. Some db entities don't have links between them, meaning not fk or manytomany relationships, just hold a field that has an id to another, but the relationship is not enforced in the database? Will this be an issue for me? Does it mean once more that I should skip the select_related approach and write a customer sql for fetching?

  4. How would you suggest to approach performance tuning this nightmare of queries?


Solution

  • There's no silver bullet without looking at the code (and the profiling results) in detail.

    The only thing that is a no-brainer is enforcing relationships in the models and in the database. This prevents a whole host of bugs, encourages the use of standardized, performant access (rather than concocting SQL on the spot which more often than not is likely to be buggy and slow) and makes your code both shorter and a lot more readable.

    Other than that, 50-60 queries can be a lot (if you could do the same job with one or two) or it can be just right - it depends on what you achieve with them.

    The use of prefetch_related and select_related is important, yes – but only if used correctly; otherwise it can slow you down instead of speeding you up.

    Nested serializers are the correct approach if you need the data – but you need to set up your querysets properly in your viewset if you want them to be fast.

    Time the main parts of slow views, inspect the SQL queries sent and check if you really need all data that is returned.

    Then you can look at the sore spots and gain time where it matters. Asking specific questions on SO with complete code examples can also get you far fast.


    If you have just one top-level object, you can refine the approach offered by @jensmtg, doing all the prefetches that you need at that level and then for the lower levels just using ModelSerializers (not SerializerMethodFields) that access the prefetched objects. Look into the Prefetch object that allows nested prefetching.

    But be aware that prefetch_related is not for free, it involves quite some processing in Python; you may be better off using flat (db-view-like) joined queries with values() and values_list.