Search code examples
djangodjango-sessions

Huge Django Session table, normal behaviour or bug?


Perhaps this is completely normal behaviour, but I feel like the django_session table is much larger than it should have to be.

First of all, I run the following cleanup command daily so the size is not caused by expired sessions:

DELETE FROM %s WHERE expire_date < NOW()

The numbers:

  • We've got about 5000 unique visitors (bots excluded) every day.
  • The SESSION_COOKIE_AGE is set to the default, 2 weeks
  • The table has a little over 1,000,000 rows

So, I'm guessing that Django also generates session keys for all bots that visits the site and that the bots don't store the cookies so it continuously generates new cookies.

But... is this normal behaviour? Is there a setting so Django won't generate sessions for anonymous users, or atleast... no sessions for users that aren't using sessions?


Solution

  • After a bit of debugging I've managed to trace cause of the problem. One of my middlewares (and most of my views) have a request.user.is_authenticated() in them.

    The django.contrib.auth middleware sets request.user to LazyUser()

    Source: http://code.djangoproject.com/browser/django/trunk/django/contrib/auth/middleware.py?rev=14919#L13 (I don't see why there is a return None there, but ok...)

    class AuthenticationMiddleware(object):
        def process_request(self, request):
            assert hasattr(request, 'session'), "The Django authentication middleware requires session middleware to be installed. Edit your MIDDLEWARE_CLASSES setting to insert 'django.contrib.sessions.middleware.SessionMiddleware'."
            request.__class__.user = LazyUser()
            return None
    

    The LazyUser calls get_user(request) to get the user:

    Source: http://code.djangoproject.com/browser/django/trunk/django/contrib/auth/middleware.py?rev=14919#L5

    class LazyUser(object):
        def __get__(self, request, obj_type=None):
            if not hasattr(request, '_cached_user'):
                from django.contrib.auth import get_user
                request._cached_user = get_user(request)
           return request._cached_user
    

    The get_user(request) method does a user_id = request.session[SESSION_KEY]

    Source: http://code.djangoproject.com/browser/django/trunk/django/contrib/auth/init.py?rev=14919#L100

    def get_user(request):
        from django.contrib.auth.models import AnonymousUser
        try:
            user_id = request.session[SESSION_KEY]
            backend_path = request.session[BACKEND_SESSION_KEY]
            backend = load_backend(backend_path)
            user = backend.get_user(user_id) or AnonymousUser()
        except KeyError:
            user = AnonymousUser()
        return user
    

    Upon accessing the session sets accessed to true:

    Source: http://code.djangoproject.com/browser/django/trunk/django/contrib/sessions/backends/base.py?rev=14919#L183

    def _get_session(self, no_load=False):
        """
        Lazily loads session from storage (unless "no_load" is True, when only
        an empty dict is stored) and stores it in the current instance.
        """
        self.accessed = True
        try:
            return self._session_cache
        except AttributeError:
            if self._session_key is None or no_load:
                self._session_cache = {}
            else:
                self._session_cache = self.load()
        return self._session_cache
    

    And that causes the session to initialize. The bug was caused by a faulty session backend that also generates a session when accessed is set to true...