Search code examples
ccpio

Extra bytes are padding in the generated file from CPIO


I have a list of files in a directory and I want to create one archive format file. I used CPIO to create the file as

ls |  cpio -ov -H crc > demo.cpio

and I have a cpio structure like this

struct cpio_newc_header {
        char    c_magic[6];
        char    c_ino[8];
        char    c_mode[8];
        char    c_uid[8];
        char    c_gid[8];
        char    c_nlink[8];
        char    c_mtime[8];
        char    c_filesize[8];
        char    c_devmajor[8];
        char    c_devminor[8];
        char    c_rdevmajor[8];
        char    c_rdevminor[8];
        char    c_namesize[8];
        char    c_check[8];
};

I can able to fetch the metadata, pathname, file data in the header by using the c_filesize,c_namesize.I can fetch the file data based on c_filesize,but after fetching the file data there some extra bits are padded,i.e after the file data and before the next header.

00000230: 6e63 6965 7322 3a5b 5d0d 0a7d 0d0a 0000  ncies":[]..}....
00000240: 3037 3037 3032 3030 3636 4246 3838 3030  0707020066BF8800

here we can observe after the '}' some extra bytes are padded. I taught its rounding by the multiples of four but I observed some other data which is not multiples of four

00000450: 2066 6f72 2063 7279 7074 6f20 7665 7269  for datapo veri
00000460: 6669 6361 7469 6f6e 0a00 0000 3037 3037  fication....0707

Why the extra bytes are padding.Can we avoid while doing CPIO?


Solution

  • From the manpage of cpio (section New ASCII Format):

    The pathname is followed by NUL bytes so that the total size of the fixed header plus pathname is a multiple of four. Likewise, the file data is padded to a multiple of four bytes. Note that this format supports only 4 gigabyte files (unlike the older ASCII format, which supports 8 gigabyte files).

    See also man 5 cpio

    In your second example, it is also padded to be 4-bytes-aligned:

    00000460: 6669 6361 7469 6f6e 0a00 0000 3037 3037  fication....0707
    

    You see, the data ends at 0x468 and three extra zero bytes for padding are added, so the next chunk can start at 0x46c.

    This padding is probably performed to avoid unaligned access to header fields after reading it into memory. It is part of the specification, so there is no option to avoid it.

    But it's easy to calculate it. If the offset x is the next byte after the file end, then the next header begins at offset

    int nextheader = (x+3)&~3;