Search code examples
node.jszlibdeflate

which delimiter can I use safely to separate zlib deflated strings in node


I need to send content from a client to a remote server using node.js. The content can be anything (a user can upload any file).

Each piece of content is compressed by zlib.deflate before sending it to the remote. I prefer not to make multiple roundtrips and send the entire content at once.

To separate between each piece of content, I need a character that can't be used in the compressed string, so I can split it safely on the remote.


Solution

  • There is no such character or sequence of characters. zlib compressed data can contain any sequence of bytes.

    You could encode the zlib compressed data to avoid one byte value, expanding compressed data slightly. Then you could use that one byte value as a delimiter.

    Example code:

    // Example of encoding binary data to a sequence of bytes with no zero values.
    // The result is expanded slightly. On average, assuming random input, the
    // expansion is less than 0.1%. The maximum expansion is less than 14.3%, which
    // is reached only if the input is a sequence of bytes all with value 255.
    
    #include <stdio.h>
    
    // Encode binary data read from in, to a sequence of byte values in 1..255
    // written to out. There will be no zero byte values in the output. The
    // encoding is decoding a flat (equiprobable) Huffman code of 255 symbols.
    void no_zeros_encode(FILE *in, FILE *out) {
        unsigned buf = 0;
        int bits = 0, ch;
        do {
            if (bits < 8) {
                ch = getc(in);
                if (ch != EOF) {
                    buf += (unsigned)ch << bits;
                    bits += 8;
                }
                else if (bits == 0)
                    break;
            }
            if ((buf & 127) == 127) {
                putc(255, out);
                buf >>= 7;
                bits -= 7;
            }
            else {
                unsigned val = buf & 255;
                buf >>= 8;
                bits -= 8;
                if (val < 127)
                    val++;
                putc(val, out);
            }
        } while (ch != EOF);
    }
    
    // Decode a sequence of byte values made by no_zeros_encode() read from in, to
    // the original binary data written to out. The decoding is encoding a flat
    // Huffman code of 255 symbols. no_zeros_encode() will not generate any zero
    // byte values in its output (that's the whole point), but if there are any
    // zeros in the input to no_zeros_decode(), they are ignored.
    void no_zeros_decode(FILE *in, FILE *out) {
        unsigned buf = 0;
        int bits = 0, ch;
        while ((ch = getc(in)) != EOF)
            if (ch != 0) {              // could flag any zeros as an error
                if (ch == 255) {
                    buf += 127 << bits;
                    bits += 7;
                }
                else {
                    if (ch <= 127)
                        ch--;
                    buf += (unsigned)ch << bits;
                    bits += 8;
                }
                if (bits >= 8) {
                    putc(buf, out);
                    buf >>= 8;
                    bits -= 8;
                }
            }
    }