Search code examples
pythongoogle-app-enginememory-leaksgoogle-cloud-storagewebapp2

Memory leak in reading files from Google Cloud Storage at Google App Engine (python)


Below is part of the python code running at Google App Engine. It fetches a file from Google Cloud Storage by using cloudstorage client.

The problem is that each time the code reads a big file(about 10M), the memory used in the instance will increase linearly. Soon, the process is terminated due to "Exceeded soft private memory limit of 128 MB with 134 MB after servicing 40 requests total".

class ReadGSFile(webapp2.RequestHandler):
    def get(self):
        import cloudstorage as gcs

        self.response.headers['Content-Type'] = "file type"
        read_path = "path/to/file"

        with gcs.open(read_path, 'r') as fp:
            buf = fp.read(1000000)
            while buf:
                self.response.out.write(buf)
                buf = fp.read(1000000)
            fp.close()

If I comment out the following line, then memory usage in instance does change. So it should be the problem of webapp2.

  self.response.out.write(buf)

It is supposed that webapp2 will release memory space after finishing the response. But in my code, it does not.


Solution

  • Suggested by above user voscausa's comment, I changed the scheme for file downloading, that is, to serve file downloading by using Blobstore. Now the problem of memory leak is solved.

    Reference: https://cloud.google.com/appengine/docs/python/blobstore/#Python_Using_the_Blobstore_API_with_Google_Cloud_Storage

    from google.appengine.ext import blobstore
    from google.appengine.ext.webapp import blobstore_handlers
    
    class GCSServingHandler(blobstore_handlers.BlobstoreDownloadHandler):
      def get(self):
        read_path = "/path/to/gcs file/"  # The leading chars should not be "/gs/"
        blob_key  = blobstore.create_gs_key("/gs/" + read_path)
    
        f_name = "file name"
        f_type = "file type" # Such as 'text/plain'
    
        self.response.headers['Content-Type'] = f_type
        self.response.headers['Content-Disposition'] = "attachment; filename=\"%s\";"%f_name
        self.response.headers['Content-Disposition'] += " filename*=utf-8''" + urllib2.quote(f_name.encode("utf8"))
    
        self.send_blob(blob_key)