Search code examples
c++ziparchive

How I can use a zip as a container for my data?


Often i come across filetypes that have a zip signature but they are not compressed archives or they are simply not zip files; in general there are files that from the signature look like compressed archives but in reality they are just containers for a custom dataset.

For example the .blend files or the .apk can be opened with a archive utility.

With a programmatic approach, how i can create my own new filetype definition avoiding all the complicated bit about creating a new filetype and using a zip as container ?

I'm interested in this for C/C++ programming.


EDIT:

I also would like to stress the fact that I'm talking about containers to avoid platform related issues like the ones about encoding and data representation.


Solution

  • I suspect you're asking how does one create something exactly like a zip file with a different file extension (which afaik is the tactic used for .APK files for example).

    The rather simple answer is you create one of these exactly as you otherwise would and use a file name with your own extension. As mentioned in the comments there are various libraries for creating and processing zip file that you can use.

    This file could (like an APK) be opened in an archive utility but by default be associated with some other action on the system (as apk's are associated with installation and .blend is associated with blender).

    This technique is extremely common and many common application file formats are little more than a zipped collection of files and some standard index file that shows how they fit together.

    Also note @Jan Hudec's comment about encoding issues - there no escaping through this route.

    On Encoding: (I suggest you ask a new, different question about these issues - many here would have more hands-on knowledge than me) You mention that you want to get exactly the same data on potentially heterogeneous systems. I think you must use text encoding: there is no guarantee that the type lengths, alignment or representation are alike after your C code compiles on different systems, thus simply writing binary blobs out isn't very reliable.. Try to be as specific as possible when choosing a character set for the text encoding allowing you to match character sets accurately on the other system.