Search code examples
pythonhttpfile-uploadflasktornado

MemoryError when uploading large files to Tornado HTTP server


I am working on a Flask application that will handle large file uploads (250 MB+). I am running this application using Tornado with multiple processes so it can handle concurrent requests without blocking.

import os.path
import tempfile
from flask import Flask, request, jsonify
from tornado.wsgi import WSGIContainer
from tornado.httpserver import HTTPServer
from tornado.ioloop import IOLoop
from werkzeug import secure_filename

app = Flask(__name__)

@app.route("/test", methods=["GET"])
def test_route():

    return jsonify(msg='Ok'), 200

@app.route("/upload", methods=["GET", "POST"])
def upload_file():

    if request.method == 'POST':
        temp_directory = app.config['TMP_DIRECTORY']
        uploaded_file = request.files['filename']
        filename = secure_filename(uploaded_file.filename)
        uploaded_file.save(os.path.join(temp_directory, filename))
        return jsonify(msg="File upload successfully"), 200

    else:
        return jsonify(msg="Use POST to upload a file"), 200


if __name__ == '__main__':
    app.config['TMP_DIRECTORY'] = tempfile.mkdtemp()
    address = '0.0.0.0'
    port = 8000

    max_buffer_size = 500 * 1024 * 1024
    server = HTTPServer(WSGIContainer(app), max_buffer_size=max_buffer_size)
    server.bind(port=port, address=address)

    print("Starting Tornado server on %s:%s" % (address, port))
    server.start(2)
    IOLoop.instance().start()

I am getting the following MemoryError when uploading multiple large files at the same time:

$ curl -i -F name=file -F filename=@bigfile.iso http://127.0.0.1:8000/upload

ERROR:tornado.application:Uncaught exception
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/tornado/http1connection.py", line 238, in _read_message
    delegate.finish()
  File "/usr/lib64/python2.7/site-packages/tornado/httpserver.py", line 285, in finish
    self.request.body = b''.join(self._chunks)
MemoryError

I believe Tornado stores the entire uploaded file in memory and only writes it to disk once the client has completed the upload. Is it possible to modify this behavior to write out chunks to disk?


Solution

  • You are misunderstanding how Tornado works. It doesn't magically make your Flask app "able to handle concurrent requests without blocking" - using Flask in Tornado's WSGIContainer is less scalable than using Flask on something like uwsgi or gunicorn. See the warning in WSGIContainer's documentation.

    If you were doing this as a native Tornado application (without Flask), then you could use the tornado.web.stream_request_body decorator to handle large uploads without buffering the whole thing in memory.