I have a zip file which consists of a single file. I need to use multiple processes/nodes and each one processes only a part of the zipped contents.
Let's say the decompressed data is 100MB, and it gets compressed to 60MB.
Any way to do either of these?
A) Seek into the zip file so I can extract only 1MB of decompressed data. Then I use 100 nodes, each processing 1MB after decompression.
Or
B) Just decompress 1MB of the zip data. Then I use 600 nodes, and they end up processing a different amount of decompressed data.
It's fine also if the split points are not 1MB, but match some other intervals (don't all have to be the same) if it would simplify implementation. Goal is just that each node can get a different part of the data, without each node having to decompress the whole source.
Ok, I'll answer my question.
No.
(see comments for more info, basically with zip we have to decompress always from the start).