Search code examples
c++compressionarchive

Should I use .tar.gz?


In the Unix world, there is a famous format called "tar.gz".

But now, I want to develop a game and random accessing a file will be more efficient. If it is archived first, it will cause sequential access.

I know that there is an alternative format called zip or 7z, but what about other formats?

Not only gz.tar, I'd like to a minor compressing library and also get archiving features.

Should I use *.tar or other solutions are available?

PS: I'm using C++.


Solution

  • "Random" access is not good on a .tar.gz, since that is a .tar file that has been wrapped in a .gz compression, so to get to things in the .tar file, you'd first have to decompress the .tar file.

    It would be possible to use a .tar file that contains individual files compressed with .gz. You can read the table of content of the .tar file and find/store where all the files are in the archive, and then extract as you need. However, you may find that using your own format is "better" (for example, if I remember correctly, the "header" for a tar-archive is a file at a time, you may want to build your header in one lump, before you store the files [which does mean at least enumerating all the relevant files first, then forming the compressed variant and "patching up" the header with the offsets in compressed form]

    For a game, one critical factor would probably be the decompression speed, so you may want to look at different libraries and which one has the best decompression speed. I found this when searching for a comparison: http://catchchallenger.first-world.info//wiki/Quick_Benchmark:_Gzip_vs_Bzip2_vs_LZMA_vs_XZ_vs_LZ4_vs_LZO

    You may also care about memory usage, which also varies a bit depending on algorithm.

    And I'm guessing your individual files will be much smaller than the entire tar-ball of Linux, so you may want to do your own benchmark, with your own data - after all, the speed of different compression formats does, to some degree, depend on the format of the data.