Search code examples
javacompressionarchivecorruptionbzip2

Check compressed archive for corruption


I am creating compressed archives with tar and bzip2 using jarchivelib which utilizes org.apache.commons.compress.

try {
    Archiver archiver = ArchiverFactory.createArchiver(ArchiveFormat.TAR, CompressionType.BZIP2);
    File archive = archiver.create(archiveName, destination, sourceFilesArr);
} catch (IOException e) {
    e.printStackTrace();
}

Sometimes it can happen that the created file is corrupted, so I want to check for that and recreate the archive if necessary. There is no error thrown and I detected the corruption when trying to decompress it manually with tar -xf file.tar.bz2 (Note: extracting with tar -xjf file.tar.bz2 works flawlessly)

tar: Archive contains `\2640\003\203\325@\0\0\0\003\336\274' where numeric off_t value expected
tar: Archive contains `\0l`\t\0\021\0' where numeric mode_t value expected
tar: Archive contains `\003\301\345\0\0\0\0\006\361\0p\340' where numeric time_t value expected
tar: Archive contains `\0\210\001\b\0\233\0' where numeric uid_t value expected
tar: Archive contains `l\001\210\0\210\001\263' where numeric gid_t value expected
tar: BZh91AY&SY"'ݛ\003\314>\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\343\262\037\017\205\360X\001\210: Unknown file type `', extracted as normal file
tar: BZh91AY&SY"'ݛ�>��������������������������������������X�: implausibly old time stamp 1970-01-01 00:59:59
tar: Skipping to next header
tar: Exiting with failure status due to previous errors

Is there a way using org.apache.commons.compress to check a compressed archive if it is corrupted? Since the files can be at the size of several GB an approach without decompressing would be great.


Solution

  • As bzip2 compression produces a stream, there is no way how to check for corruption without decompressing that stream and passing it to tar to check.

    Anyway, in your case you actually decompress directly with tar and not passing first to bzip2. This is the root cause. You need to always use the -j flag to tar as it's compressed by bzip2. That's why the second command works correctly.