I may be wrong but this is how I understand Node's createGzip()
:
stream.write()
- it accumulates data in the stream,stream.end()
- then Gzip does it's magic based on what is in the stream and unloads everything.This requires pretty big amount of memory because stream has to hold everything until I call end()
.
Can I somehow "train" gzip on small amount of data (I mean to train it how to compress data. My data has similar patterns throughout entire dataset) and then just stream everything through it without waiting for stream.end()
?
I want to compress ~100GB of data and stream just won't be able to accumulate that much due to runtime's memory limits.
Yes - Node's zlib implementation works like described in a post. It accumulates entire data in the buffer before calling end()
.
One solution is to use pako
package. This is custom zlib wrapper.
It doesn't support streams out of the box but you can overwrite methods onData()
and turn it into stream pretty easily. You can then achieve true streaming compression.