Search code examples
linuxcompressiongziptarunzip

Is it possible that tar-cvzf adds file information while zipping large file? (File is around 200 gb )


I zipped a large regular unix file (.dat) using tar -cvzf command . This file is of around 200 gb in size. After zipping it became 27gb in size. But while reading data in that zipped file i can see annonymous data added at start of file. Is this possible? I tried to unzip that file again and found that unzipped file has no such anonymous records.


Solution

  • The GNU tar command is free software. Please study its source code. Read of course its man page, tar(1).

    Indeed, a tar archive starts with a header documented in header file tar.h. There is a POSIX standard related to tar.

    See also Peter Miller's tardy utility.

    Don't confuse tar archives with zip ones handled by Info-ZIP (so zip and unzip commands).

    GNU zip -a compressor, the gzip program which can be started by tar, notably your tar czvf command- is also free software, and of course you should study its source code if interested.

    Some Unix shells (notably sash or busybox) have a builtin tar.

    I tried to unzip that file again and found that unzipped file has no such anonymous records.

    AFAIK, most Linux filesystems try to implement more or less the POSIX standard -based upon read(2) and write(2) system calls, and they don't know about records. If you need "records", consider using databases (like sqlite or PostGreSQL) or indexed files (like GDBM) - both built above Linux file systems or block devices.

    Read also a good textbook on operating systems.

    Notice that "a large regular unix file" is mostly a sequence of bytes. There is no notion of records inside them, except as a convention used by other user-space programs thru syscalls(2). See also path_resolution(7) and inode(7).