Search code examples
httpstreamzlib

Stream a HTTP post multipart/form-data through compression and into storage while uploading?


Background

I want to reduce the memory and temporary storage footprint of a service that takes a file, zips it and stores it away somewhere. Lets say the memory limit is 4 GB, disk storage limit is 512 MB and the files processed can be 10 GB.

Question

  • Is it possible to stream the file, while being uploaded over HTTP, through zlib (or somewhere else)? Or is it a limitation of the HTTP protocol that the file have to be completely uploaded before I can access the data?

  • Where can I read more about this?


Solution

  • "Is it possible to stream the file, while being uploaded over HTTP?"

    -- Yes. This is what multipart/form-data do with file upload. Actually, according to RFC7578:

    The media type multipart/form-data follows the model of multipart MIME data streams

    "Or is it a limitation of the HTTP protocol that the file have to be completely uploaded before I can access the data?"

    -- No. You can access the data as soon as some bytes are uploaded, and don't need to wait for the uploading to complete. However, I'm not familiar with zlib, and not sure whether zlib can use part of the file bytes.

    You can make a small experiment to demonstrate this "access data while it is uploading" behaviour. Here is a simple Node.js web application snippet, but you can implement this example with any server side technology:

    const Busboy = require('busboy');
    router.post('/upload', function (req, res) {
      let busboy = new Busboy({headers: req.headers});
      busboy.on('file', function (fieldName, fileStream, fileName) {
        fileStream.on('data', function(data) {
          console.log(data);
        });
      })
    })
    

    After a big file is uploaded and POST /upload request is sent, you can observe that: while the HTTP request is still pending in browser's network debug panel, the uploaded bytes have already been printed:

    enter image description here

    (Please note I'm using "Fast 3G" throttling to simulate slow network.)