I zipped a large regular unix file (.dat) using tar -cvzf command . This file is of around 200 gb in size. After zipping it became 27gb in size. But while reading data in that zipped file i can see annonymous data added at start of file. Is this possible? I tried to unzip that file again and found that unzipped file has no such anonymous records.
The GNU tar command is free software. Please study its source code. Read of course its man page, tar(1).
Indeed, a tar
archive starts with a header documented in header file tar.h
. There is a POSIX standard related to tar.
See also Peter Miller's tardy utility.
Don't confuse tar
archives with zip
ones handled by Info-ZIP (so zip
and unzip
commands).
GNU zip -a compressor, the gzip
program which can be started by tar
, notably your tar czvf
command- is also free software, and of course you should study its source code if interested.
Some Unix shells (notably sash or busybox) have a builtin tar
.
I tried to unzip that file again and found that unzipped file has no such anonymous records.
AFAIK, most Linux filesystems try to implement more or less the POSIX standard -based upon read(2) and write(2) system calls, and they don't know about records. If you need "records", consider using databases (like sqlite or PostGreSQL) or indexed files (like GDBM) - both built above Linux file systems or block devices.
Read also a good textbook on operating systems.
Notice that "a large regular unix file" is mostly a sequence of bytes. There is no notion of records inside them, except as a convention used by other user-space programs thru syscalls(2). See also path_resolution(7) and inode(7).