Search code examples
pythondjangodjango-pagination

Django Paginate CPU Time scaling with number of selected objects not displayed objects


I have a simple database with about 3900 entries, and am using a generic view (django.views.generic.list_detail.object_list) with its django-pagination (through paginate_by) to browse the data in the database, but some queries are very slow.

The weird thing is that despite only showing 50 objects per page the rendering time scales roughly linearly with how many objects are selected (and I do not do any sorting of objects). E.g., if I do a query with ~3900, ~1800, ~900, ~54 selected objects it respectively takes ~8500 ms, ~4000 ms, ~2500 ms, ~800 ms of CPU time (using django-debug-toolbar) while the SQL only took ~50 ms, ~40 ms, ~35 ms, ~30 ms, again while all pages had exactly 50 objects. I have minimized the number of SQL queries using select_related as suggested in the django optimization page.

Using profiling middleware the vast majority of the time on long queries is spent doing db stuff:

         735924 function calls (702255 primitive calls) in 11.950 CPU seconds

   Ordered by: internal time, call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
35546/3976    4.118    0.000    9.585    0.002 /usr/local/lib/python2.6/dist-packages/django/db/models/query.py:1120(get_cached_row)
    30174    3.589    0.000    3.991    0.000 /usr/local/lib/python2.6/dist-packages/django/db/models/base.py:250(__init__)

 ---- By file ----

      tottime
47.0%   3.669 /usr/local/lib/python2.6/dist-packages/django/db/models/base.py
 7.7%   0.601 /usr/local/lib/python2.6/dist-packages/django/db/models/options.py
 6.8%   0.531 /usr/local/lib/python2.6/dist-packages/django/db/models/query_utils.py
 6.6%   0.519 /usr/local/lib/python2.6/dist-packages/django/db/backends/sqlite3/base.py
 6.4%   0.496 /usr/local/lib/python2.6/dist-packages/django/db/models/sql/compiler.py
 5.0%   0.387 /usr/local/lib/python2.6/dist-packages/django/db/models/fields/__init__.py
 3.1%   0.244 /usr/local/lib/python2.6/dist-packages/django/db/backends/util.py
 2.9%   0.225 /usr/local/lib/python2.6/dist-packages/django/db/backends/__init__.py
 2.7%   0.213 /usr/local/lib/python2.6/dist-packages/django/db/models/query.py
 2.2%   0.171 /usr/local/lib/python2.6/dist-packages/django/dispatch/dispatcher.py
 1.7%   0.136 /usr/local/lib/python2.6/dist-packages/django/template/__init__.py
 1.7%   0.131 /usr/local/lib/python2.6/dist-packages/django/utils/datastructures.py
 1.1%   0.088 /usr/lib/python2.6/posixpath.py
 0.8%   0.066 /usr/local/lib/python2.6/dist-packages/django/db/utils.py
...
 ---- By group ---

      tottime
89.5%   6.988 /usr/local/lib/python2.6/dist-packages/django/db
 3.6%   0.279 /usr/local/lib/python2.6/dist-packages/django/utils
...

I can understand why the SQL query could scale with the number of selected entries. However, I don't see why the rest of the CPU time should be in anyway affected. This is very counterintuitive and I was wondering if there's any debugging/profiling tips someone could help me with.

Using django-1.2.3 with sqlite, python2.6, apache2-prefork (though switching to mpm-worker didn't significantly change things). Any tips/tricks would be greatly appreciated. Memory usage doesn't seem to be a factor (machine has 2Gb RAM and free says only using 300Mb in use (additionally 600Mb of cache)) either and the database is on the same server as the machine.

Found my mistake. I found my mistake. I checked the length of the original queryset to see if it was length 1 (and then went to object_detail if so). This resulted in evaluating the full queryset (which still only took 5ms according to django-debug-toolbar), but slowed everything down significantly.

Basically had something stupid like:

    if len(queryset) == 1:                                 
        return HttpResponseRedirect( fwd to object_detail url ...)
    return object_list(request, queryset=queryset, paginate_by=  ...)

which evaluated the full query; not the paginated query.


Solution

  • When django does pagination it will use standard QuerySet slicing to get the results, this means it will use LIMIT and OFFSET.

    You can view the SQL the ORM generates by calling str() on the .query attribute of the QuerySet:

        print MyModel.objects.all().query
        print MyModel.objects.all()[50:100].query
    

    You can then ask sqlite to EXPLAIN the query and see what the database is trying to do. I'm guessing you are sorting on some field that does not have an index. EXPLAIN QUERY PLAN will tell you what indices would have been used, according to the sqlite documentation at http://www.sqlite.org/lang_explain.html