Search code examples
node.jswebsocketzipgziparchive

Websockets with Streaming Archives


So this is the setup I'm working with:

  1. I am on an express server which must stream an archived binary payload to a browser (does not matter if it is zip, tar or tar.gz - although zip would be nice).
  2. On this server, I have a websocket open that connects to another server which is sending me binary payloads of individual files in a directory. I get these payloads streamed, piece-by-piece, as buffers, and I'm doing this serially (that is - file-by-file - there aren't multiple websockets open at one time, and there is one websocket per file). This is the websocket library I'm using: https://github.com/einaros/ws

I would like to go through each file, open a websocket, and then append the buffers to an archiver as they come through the websockets. When data is appended to the archiver, it would be nice if I could stream the ouput of the archiver to the browser (via the response object with response.write). So, basically, as I'm getting the payload from the websocket, I would like that payload streamed through an archiver and then to the response. :-)

Some things I have looked into:

  1. node-zipstream - This is nice because it gives me an output stream I can pipe directly to response.write. However, it doesn't appear to support nested files/folders, and, more importantly, it only accepts an input stream. I have looked at the source code (which is quite terse and readable), and it seems as though, if I were able to have access to the update function within ZipStream.prototype.addFile, I could just call that each time on the message event when I get a binary buffer from the websocket. This is quite messy/hacky though, and, given that this library already doesn't seem to support nested files/folders, I'm not sure I will be going with it.
  2. node-archiver - This suffers from the same issue as node-zipstream (probably because it was inspired by it) where it allows me to pipe the output, but I cannot append multiple buffers for the same file within the archive (and then manually signal when the last buffer has been appended for a given file). However, it does allow me to have nested folders, which is a clear win over node-zipstream.

Is there something I'm not aware of, or is this just a really crazy thing that I want to do?

The only alternative I see at this point is to wait for the entire payload to be streamed through a websocket and then append with node-archiver, but I really would like to reap the benefit of true streaming/archiving on-the-fly.

I've also thought about the possibility of creating a read stream of sorts just to serve as a proxy object that I can pass into node-archiver and then just append the buffers I get from the websocket to this read stream. Looking at various read streams, I'm not sure how to do this though. The only way I could think of was creating a writestream, piping buffers to it, and having a readstream read from that writestream. Am I on the correct thought process here?

As always, thanks for any help/direction you can offer SO community.

EDIT:

Since I just opened this question, and I'm new to node, there may be a better answer than the one I provided. I will keep this question open and accept a better answer if one presents itself within a few days. As always, I will upvote any other answers, even if they're ridiculous, as long as they're correct and allow me to stream on-the-fly as mine does.


Solution

  • I figured out a way to get this working with node-archiver. :-)

    It was based off my hunch of creating a temporary "proxy stream" of sorts, inspired by this SO question: How to create streams from string in Node.Js?

    The basic gist is (coffeescript syntax):

    archive = archiver 'zip'
    archive.pipe response // where response is the http response
    
    // and then for each file...
    fileName = ... // known file name
    fileSize = ... // known file size
    ws = .... // create websocket
    proxyStream = new Stream()
    numBytesStreamed = 0
    
    archive.append proxyStream, name: fileName
    
    ws.on 'message', (dataBuffer) ->
        numBytesStreamed += dataBuffer.length
        proxyStream.emit 'data', dataBuffer
    
        if numBytesStreamed is fileSize
            proxyStream.emit 'end'
            // function/indicator to do this for the next file in the folder
    
    // and then when you're completely done...
    archive.finalize (err, bytesOfArchive) ->
        if err?
            // do whatever
        else
            // unless you somehow knew this ahead of time
            res.addTrailers
                'Content-Length': bytesOfArchive
            res.end()
    

    Note that this is not the complete solution I implemented. There is still a lot of logic dealing with getting the files, their paths, etc. Not to mention error-handling.

    EDIT:

    Since I just opened this question, and I'm new to node, there may be a better answer. I will keep this question open and accept a better answer if one presents itself within a few days. As always, I will upvote any other answers, even if they're ridiculous, as long as they're correct and allow me to stream on-the-fly as mine does.