Search code examples
djangodownloadzipstreaming

What will happen if the reported size of a streaming download is inaccurate?


I implemented a download view in my django project that builds a zip archive and streams it on the fly.

The files included in the archive are 1 tsv file and any number of xml (from a set of search results) that are organized into a series of directories.

The download works, but there is no progress. We have tested a small download (47Mb) and a large one (3Gb).

I was thinking that it would be nice to have a progress bar to give the user some idea of how long the download will take, however, from what I've read, predicting the size of a zip file is tricky/prone-to-inaccuracy, so I'm wondering (since I'm very inexperienced with zip file generation [let alone streaming downloads])...

  • What would happen with a download from a user perspective if I supply an estimated size? Specifically, would a download fail?
    • What would happen if the estimated size is too big?
    • What would happen if the estimated size is too small?

Are there any alternate solutions for this problem space that I should consider?


Solution

  • To have a progress, you need to send Content-Length header in the response and you can't send that with streaming requests as you don't know the exact size of the response before start streaming.

    OK, so what happens if we estimate Content-Length:

    • If it is below the real length, the request will be terminated early as from the browser's point of view, all data is received.
    • If the value is higher than the real length, then the browser will keep waiting for the content that will never be received as it isn't there.

    The solution is to do all the work first on the server, so you send the file all at once (with the content length set probably), but for sure, it can cause Gateway Timeout, if you are compressing for a long time.