I've got a proxy written in Django which receives requests for certain files. After deciding whether the user is allowed to see the file the proxy gets the file from a remote service and serves it to the user. There's a bit more to it but this is the gist.
This setup works great for single files, but there is a new requirement that the users want to download multiple files together as a zip. The files are sometimes small, but can also become really large (100MB plus) and it can be anywhere from 2 up to 1000 files simultaneously. This can become really large, and a burden to first get all those files, zip them and then serve them in the same request.
I read about the possibility to create "streaming zips"; a way to open a zip and then start sending the files in that zip until you close it. I found a couple php examples and in Python the django-zip-stream extension. They all assume locally stored files and the django extension also assumes the usages of nginx.
There are a couple things I wonder about in my situation:
Does anybody know whether streaming zips are a good idea with my setup of very large remote files? I'm a bit afraid that many requests will easily DOS our servers because of CPU or memory limits.
I can also build a queue which zips the files and sends an email to the user, but if possible I'd like to keep the application as stateless as possible.
This sounds to me like a perfect use case to be solved queueing jobs and processing them in the background.
Advantages:
The second advantage is particularly desirable since you’re prepared to receive multiple concurrent requests.
I would also consider using a “task” Django model with a FileField to be used as a container for the resulting zip file, so it will be statically and efficiently served by Nginx from the media folder. As an additional benefit, you will monitor what’s going on directly from he Django admin user interface.
I’ve used a similar approach in many Django project, and that has proven to be quite robust and manageable; you might want to take a quick look at the following django app I’m using for that: https://github.com/morlandi/django-task
To summarize: