Search code examples
javagoogle-cloud-platformgoogle-cloud-storagetararchive

Corrupted TAR File Error Upon Access From Google Cloud Storage in Java


I am storing a TAR file in Google Cloud Storage. The file can be successfully downloaded via gsutil and extracted in my computer using macOS Archive Utility. However, the Java program that I implement always encounter java.io.IOException: Corrupted TAR archive upon accessing the file. I have tried several ways and all of them are utilizing the org.apache.commons:commons-compress library. Can you give me insight on how to fix this problem or something that I can try on?

Here are the implementations that I have tried:

Blob blob = storage.get(BUCKET_NAME, FILE_PATH);
blob.downloadTo(Paths.get("filename.tar"));
String contentType = blob.getContentType(); // application/x-tar

InputStream is = Channels.newInputStream(blob.reader());
String mime = URLConnection.guessContentTypeFromStream(is); // null
TarArchiveInputStream ais = new TarArchiveInputStream(is);
ais.getNextEntry(); // raise java.io.IOException: Corrupted TAR archive

InputStream is2 = new ByteArrayInputStream(blob.getContent());
String mime2 = URLConnection.guessContentTypeFromStream(is2); // null
TarArchiveInputStream ais2 = new TarArchiveInputStream(is2);
ais2.getNextEntry(); // raise java.io.IOException: Corrupted TAR archive

InputStream is3 = new FileInputStream("filename.tar");
String mime3 = URLConnection.guessContentTypeFromStream(is3); // null
TarArchiveInputStream ais3 = new TarArchiveInputStream(is3);
ais3.getNextEntry(); // raise java.io.IOException: Corrupted TAR archive

TarFile file = new TarFile(blob.getContent()); // raise java.io.IOException: Corrupted TAR archive
TarFile tarFile = new TarFile(Paths.get("filename.tar")); // raise java.io.IOException: Corrupted TAR archive

Addition: I have tried to parse a JSON from GCS and it's working fine.

Blob blob = storage.get(BUCKET_NAME, FILE_PATH);
JSONTokener jt = new JSONTokener(Channels.newInputStream(blob.reader()));
JSONObject jo = new JSONObject(jt);

Solution

  • The problem is that your tar is compressed, it is a tgz file.

    For that reason, you need to decompress the file when processing your tar contents.

    Please, consider the following example; note the use of the common compress builtin GzipCompressorInputStream class:

    public static void main(String... args) {
      final File archiveFile = new File("latest.tar");
      try (
          FileInputStream in = new FileInputStream(archiveFile);
          GzipCompressorInputStream gzIn = new GzipCompressorInputStream(in);
          TarArchiveInputStream tarIn = new TarArchiveInputStream(gzIn)
      ) {
        TarArchiveEntry tarEntry = tarIn.getNextTarEntry();
        while (tarEntry != null) {
          final File path = new File("/tmp/" + File.separator + tarEntry.getName());
          if (!path.getParentFile().exists()) {
            path.getParentFile().mkdirs();
          }
    
          if (!tarEntry.isDirectory()) {
            try (OutputStream out = new FileOutputStream(path)){
              IOUtils.copy(tarIn, out);
            }
          }
          tarEntry = tarIn.getNextTarEntry();
        }
      } catch (FileNotFoundException e) {
        e.printStackTrace();
      } catch (IOException e) {
        e.printStackTrace();
      }
    }