I am working on a Flask application that will handle large file uploads (250 MB+). I am running this application using Tornado with multiple processes so it can handle concurrent requests without blocking.
import os.path
import tempfile
from flask import Flask, request, jsonify
from tornado.wsgi import WSGIContainer
from tornado.httpserver import HTTPServer
from tornado.ioloop import IOLoop
from werkzeug import secure_filename
app = Flask(__name__)
@app.route("/test", methods=["GET"])
def test_route():
return jsonify(msg='Ok'), 200
@app.route("/upload", methods=["GET", "POST"])
def upload_file():
if request.method == 'POST':
temp_directory = app.config['TMP_DIRECTORY']
uploaded_file = request.files['filename']
filename = secure_filename(uploaded_file.filename)
uploaded_file.save(os.path.join(temp_directory, filename))
return jsonify(msg="File upload successfully"), 200
else:
return jsonify(msg="Use POST to upload a file"), 200
if __name__ == '__main__':
app.config['TMP_DIRECTORY'] = tempfile.mkdtemp()
address = '0.0.0.0'
port = 8000
max_buffer_size = 500 * 1024 * 1024
server = HTTPServer(WSGIContainer(app), max_buffer_size=max_buffer_size)
server.bind(port=port, address=address)
print("Starting Tornado server on %s:%s" % (address, port))
server.start(2)
IOLoop.instance().start()
I am getting the following MemoryError when uploading multiple large files at the same time:
$ curl -i -F name=file -F filename=@bigfile.iso http://127.0.0.1:8000/upload
ERROR:tornado.application:Uncaught exception
Traceback (most recent call last):
File "/usr/lib64/python2.7/site-packages/tornado/http1connection.py", line 238, in _read_message
delegate.finish()
File "/usr/lib64/python2.7/site-packages/tornado/httpserver.py", line 285, in finish
self.request.body = b''.join(self._chunks)
MemoryError
I believe Tornado stores the entire uploaded file in memory and only writes it to disk once the client has completed the upload. Is it possible to modify this behavior to write out chunks to disk?
You are misunderstanding how Tornado works. It doesn't magically make your Flask app "able to handle concurrent requests without blocking" - using Flask in Tornado's WSGIContainer
is less scalable than using Flask on something like uwsgi
or gunicorn
. See the warning in WSGIContainer
's documentation.
If you were doing this as a native Tornado application (without Flask), then you could use the tornado.web.stream_request_body
decorator to handle large uploads without buffering the whole thing in memory.