Search code examples
pythongoogle-app-engineblobstore

google app engine + python: uploading to blobstore causes wrong encoding


I tried to upload blobs to Google App Engine's blobstore using the following HTML form:

<!DOCTYPE html>
<html>
<head>
<meta charset=utf-8>
</head>
<body>
<form id=upload action={{upload_url}} method=post enctype=multipart/form-data>
  Name: <input type=text name=name>
  Your photo: <input type=file name=image required=required><br><br>
  <input type=submit value=submit>
</form>
</body>
</html>

The value of the template variable {{upload_url}} is obtained by upload_url = blobstore.create_upload_url('/upload') on the server side. The post-handling script is as follows:

    class Test(ndb.Model):
        name = StringProperty()
        image = StringProperty()

    test = Test()
    test.name = self.request.get('name')
    image = self.get_uploads('image')[0]
    test.image = str(image.key())
    test.put()

Usually, the name field will be filled with non-English characters (E.g., Chinese). The above programs works fine on my local SDK. However, the name is incorrectly coded when the program is run on Google App Engine. What's the problem then?


Solution

  • Just found out that this is an old bug for years, see here. There are two solutions:

    (1) Add the following statements into app.yaml:

    libraries:
    - name: webob
      version: "1.2.3"
    

    (2) Add the file appengine_config.yaml with the following content:

    # -*- coding: utf-8 -*-
    from webob import multidict
    
    def from_fieldstorage(cls, fs):
        """Create a dict from a cgi.FieldStorage instance.
        See this for more details:
        http://code.google.com/p/googleappengine/issues/detail?id=2749
        """
        import base64
        import quopri
    
        obj = cls()
        if fs.list:
            # fs.list can be None when there's nothing to parse
            for field in fs.list:
                if field.filename:
                    obj.add(field.name, field)
                else:
                    # first, set a common charset to utf-8.
                    common_charset = 'utf-8'
                    # second, check Content-Transfer-Encoding and decode
                    # the value appropriately
                    field_value = field.value
                    transfer_encoding = field.headers.get('Content-Transfer-Encoding', None)
                    if transfer_encoding == 'base64':
                        field_value = base64.b64decode(field_value)
                    if transfer_encoding == 'quoted-printable':
                        field_value = quopri.decodestring(field_value)
                    if field.type_options.has_key('charset') and field.type_options['charset'] != common_charset:
                        # decode with a charset specified in each
                        # multipart, and then encode it again with a
                        # charset specified in top level FieldStorage
                        field_value = field_value.decode(field.type_options['charset']).encode(common_charset)
                        # TODO: Should we take care of field.name here?
                        obj.add(field.name, field_value)
        return obj
    
    multidict.MultiDict.from_fieldstorage = classmethod(from_fieldstorage)