Below is part of the python code running at Google App Engine. It fetches a file from Google Cloud Storage by using cloudstorage client.
The problem is that each time the code reads a big file(about 10M), the memory used in the instance will increase linearly. Soon, the process is terminated due to "Exceeded soft private memory limit of 128 MB with 134 MB after servicing 40 requests total".
class ReadGSFile(webapp2.RequestHandler):
def get(self):
import cloudstorage as gcs
self.response.headers['Content-Type'] = "file type"
read_path = "path/to/file"
with gcs.open(read_path, 'r') as fp:
buf = fp.read(1000000)
while buf:
self.response.out.write(buf)
buf = fp.read(1000000)
fp.close()
If I comment out the following line, then memory usage in instance does change. So it should be the problem of webapp2.
self.response.out.write(buf)
It is supposed that webapp2 will release memory space after finishing the response. But in my code, it does not.
Suggested by above user voscausa's comment, I changed the scheme for file downloading, that is, to serve file downloading by using Blobstore. Now the problem of memory leak is solved.
from google.appengine.ext import blobstore
from google.appengine.ext.webapp import blobstore_handlers
class GCSServingHandler(blobstore_handlers.BlobstoreDownloadHandler):
def get(self):
read_path = "/path/to/gcs file/" # The leading chars should not be "/gs/"
blob_key = blobstore.create_gs_key("/gs/" + read_path)
f_name = "file name"
f_type = "file type" # Such as 'text/plain'
self.response.headers['Content-Type'] = f_type
self.response.headers['Content-Disposition'] = "attachment; filename=\"%s\";"%f_name
self.response.headers['Content-Disposition'] += " filename*=utf-8''" + urllib2.quote(f_name.encode("utf8"))
self.send_blob(blob_key)