Search code examples
python-2.7utf-8responseturbogears2

Turbogears response - sending a utf-8 filename


I'm working on a webapp using Turbogears 2.3.3
In my app the users will get a set of directions and they will need to download some files accordingly.
It is important that they will be able to download the files with their original name, which will be in utf8. Here is my method for downloading the files:

import os
from webob.static import FileApp
from tg import expose, request, use_wsgi_app, response

....

@expose()
def download(self,**kw):
    response.headerlist.append(('Content-Disposition','attachment'))
    path_to_file = os.path.join(os.path.dirname(dfuswebapp.__file__), 'PrintFiles')
    file_with_path = os.path.join(path_to_file,kw['filename'])
    file = FileApp(file_with_path)
    return use_wsgi_app(file)

When I Try to get a file like that, the file name is "download" with the extension of the original file.

If I try this code:

response.headerlist.append(('Content-Disposition','attachment;filename=%s'%str(kw['filename']))) 

I get an error if kw['filename'] is in utf-8, which most of my files will be. Is there a way have the original file names?

Thanks for the help


Solution

  • Sadly you faced one of many dark corners in WSGI and HTTP. As stated by WSGI specification:

    Note also that strings passed to start_response() as a status or as response headers must follow RFC 2616 with respect to encoding. That is, they must either be ISO-8859-1 characters, or use RFC 2047 MIME encoding.

    That means your headers should be encoded as latin-1 or using RFC2047, the issue is that as browsers do not behave in a reliable manner, support for headers outside latin-1 has been left out of webob so far ( see https://github.com/Pylons/webob/issues/11#issuecomment-2819811 ).

    Best solution is probably to manually encode the Content-Disposition header using RFC6266 which provides filename* for unicode encoding using percentage encoding. That will provide a result that is fully latin-1 compliant, making WSGI happy, and can represent unicode UTF8 characters.

    Here is a short example which gives "EURO rates" and "€ rates" depending on browsers:

    Content-Disposition: attachment;
                              filename="EURO rates";
                              filename*=utf-8''%e2%82%ac%20rates
    

    See also this post on StackOverflow for a discussion regarding this issue: How to encode the filename parameter of Content-Disposition header in HTTP?