Search code examples
pythondjangodjango-csrfpunycode

How do I handle utf-8 vs. punycode issues in Django's csrf middleware?


I have a domain with non-ascii characters similar to http://blå.no The domain is registered with its punycode equivalent:

xn--bl-zia.no

which is also set in the Apache vhost:

<VirtualHost *:443>
    ServerName xn--bl-zia.no
    ...

The problem I'm seeing is coming from a request containing:

'HTTP_USER_AGENT': 'Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko',
'HTTP_HOST': 'xn--bl-zia.no',
'SERVER_NAME': 'xn--bl-zia.no',
'HTTP_REFERER': 'https://bl\xc3\xa5.no/login/ka/?next=/start-exam/participant-login/',
'HTTP_X_REQUESTED_WITH': 'XMLHttpRequest',

ie. the referer is sent as utf-8 and not punycode. The exception I'm getting is:

Traceback (most recent call last):

  File "/srv/cleanup-project/venv/dev/lib/python2.7/site-packages/django/core/handlers/base.py", line 153, in get_response
    response = callback(request, **param_dict)

  File "/srv/cleanup-project/venv/dev/lib/python2.7/site-packages/django/utils/decorators.py", line 87, in _wrapped_view
    result = middleware.process_view(request, view_func, args, kwargs)

  File "/srv/cleanup-project/venv/dev/lib/python2.7/site-packages/django/middleware/csrf.py", line 157, in process_view
    reason = REASON_BAD_REFERER % (referer, good_referer)

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128)

The relevant code in csrf.py is:

            good_referer = 'https://%s/' % request.get_host()
            if not same_origin(referer, good_referer):
                reason = REASON_BAD_REFERER % (referer, good_referer)

(get_host() uses the SERVER_NAME from the request)

Is there a native Django way to handle this, or do I need to write a middleware that converts utf-8 to punycode in the domain part of the referer header?


Solution

  • Here's a middleware solution..

    import urlparse
    
    
    class PunyCodeU8RefererFixerMiddleware(object):
        def process_request(self, request):
            servername = request.META['SERVER_NAME']
            if 'xn--' not in servername:
                return None
    
            referer = request.META.get("HTTP_REFERER")
            if not referer:
                return None
    
            url = urlparse.urlparse(referer)
            try:
                netloc = url.netloc.decode('u8')
            except UnicodeDecodeError:
                return None
    
            def isascii(txt):
                return all(ord(ch) < 128 for ch in txt)
    
            netloc = '.'.join([
                str(p) if isascii(p) else 'xn--' + p.encode('punycode')
                for p in netloc.split('.')
            ])
            url = url._replace(netloc=netloc)
            request.META['HTTP_REFERER'] = urlparse.urlunparse(url)
            return None
    

    it tries to bail as early as possible when it detects it can't do anything useful. Must be installed before the csrf middleware of course.