Search code examples
compressionzipgziptarbzip2

TAR file format issue


It is unclear to me, what is a correct .tar file format, as I am experiencing proper functionality with three scenarios (see below).

Based on .tar specification I have been working with, the magic field (ustar) is null-terminated character string and version field is octal number with no trailing nulls.

However I've review several .tar files I found on my server and I found different implementation of magic and version field and all three of them seems to work properly, probably because system ignore those fields.

See different (3) bytes between words ustar and root in the following examples >>

Scenario 1 (20 20 00):

 000000F0      00 00 00 00 | 00 00 00 00 | 00 00 00 00      ............
 000000FC      00 00 00 00 | 00 75 73 74 | 61 72 20 20      .....ustar  
 00000108      00 72 6F 6F | 74 00 00 00 | 00 00 00 00      .root.......
 00000114      00 00 00 00 | 00 00 00 00 | 00 00 00 00      ............

Scenario 2 (00 20 20):

 000000F0      00 00 00 00 | 00 00 00 00 | 00 00 00 00      ............
 000000FC      00 00 00 00 | 00 75 73 74 | 61 72 00 20      .....ustar. 
 00000108      20 72 6F 6F | 74 00 00 00 | 00 00 00 00      root.......
 00000114      00 00 00 00 | 00 00 00 00 | 00 00 00 00      ............

Scenario 3 (00 00 00):

 000000F0      00 00 00 00 | 00 00 00 00 | 00 00 00 00      ............
 000000FC      00 00 00 00 | 00 75 73 74 | 61 72 00 00      .....ustar..
 00000108      00 72 6F 6F | 74 00 00 00 | 00 00 00 00      .root.......
 00000114      00 00 00 00 | 00 00 00 00 | 00 00 00 00      ............

Which one is the correct format?


Solution

  • In my opinion none of your examples is the correct one, at least not for the POSIX format.
    As you can read here:

    /* tar Header Block, from POSIX 1003.1-1990. */
    /* POSIX header */
    
    struct posix_header {   /* byte offset */
      char name[100];               /*   0 */
      char mode[8];                 /* 100 */
      char uid[8];                  /* 108 */
      char gid[8];                  /* 116 */
      char size[12];                /* 124 */
      char mtime[12];               /* 136 */
      char chksum[8];               /* 148 */
      char typeflag;                /* 156 */
      char linkname[100];           /* 157 */
      char magic[6];                /* 257 */
      char version[2];              /* 263 */
      char uname[32];               /* 265 */
      char gname[32];               /* 297 */
      char devmajor[8];             /* 329 */
      char devminor[8];             /* 337 */
      char prefix[155];             /* 345 */
    };
    
    #define TMAGIC   "ustar"        /* ustar and a null */
    #define TMAGLEN  6
    #define TVERSION "00"           /* 00 and no null */
    #define TVERSLEN 2
    

    The format of your first example (Scenario 1) seems to be matching with the old GNU header format:

    /* OLDGNU_MAGIC uses both magic and version fields, which are contiguous.
       Found in an archive, it indicates an old GNU header format, which will be
       hopefully become obsolescent.  With OLDGNU_MAGIC, uname and gname are
       valid, though the header is not truly POSIX conforming */
    
    #define OLDGNU_MAGIC "ustar  "  /* 7 chars and a null */
    

    In both your second and third examples (Scenario 2 and Scenario 3), the version field is set to an unexpected value (according to the above documentation, the correct value should be 00 ASCII or 0x30 0x30 hex), so this field is most likely ignored.