I am storing a TAR file in Google Cloud Storage. The file can be successfully downloaded via gsutil
and extracted in my computer using macOS Archive Utility. However, the Java program that I implement always encounter java.io.IOException: Corrupted TAR archive
upon accessing the file. I have tried several ways and all of them are utilizing the org.apache.commons:commons-compress
library. Can you give me insight on how to fix this problem or something that I can try on?
Here are the implementations that I have tried:
Blob blob = storage.get(BUCKET_NAME, FILE_PATH);
blob.downloadTo(Paths.get("filename.tar"));
String contentType = blob.getContentType(); // application/x-tar
InputStream is = Channels.newInputStream(blob.reader());
String mime = URLConnection.guessContentTypeFromStream(is); // null
TarArchiveInputStream ais = new TarArchiveInputStream(is);
ais.getNextEntry(); // raise java.io.IOException: Corrupted TAR archive
InputStream is2 = new ByteArrayInputStream(blob.getContent());
String mime2 = URLConnection.guessContentTypeFromStream(is2); // null
TarArchiveInputStream ais2 = new TarArchiveInputStream(is2);
ais2.getNextEntry(); // raise java.io.IOException: Corrupted TAR archive
InputStream is3 = new FileInputStream("filename.tar");
String mime3 = URLConnection.guessContentTypeFromStream(is3); // null
TarArchiveInputStream ais3 = new TarArchiveInputStream(is3);
ais3.getNextEntry(); // raise java.io.IOException: Corrupted TAR archive
TarFile file = new TarFile(blob.getContent()); // raise java.io.IOException: Corrupted TAR archive
TarFile tarFile = new TarFile(Paths.get("filename.tar")); // raise java.io.IOException: Corrupted TAR archive
Addition: I have tried to parse a JSON from GCS and it's working fine.
Blob blob = storage.get(BUCKET_NAME, FILE_PATH);
JSONTokener jt = new JSONTokener(Channels.newInputStream(blob.reader()));
JSONObject jo = new JSONObject(jt);
The problem is that your tar
is compressed, it is a tgz
file.
For that reason, you need to decompress the file when processing your tar contents.
Please, consider the following example; note the use of the common compress builtin GzipCompressorInputStream
class:
public static void main(String... args) {
final File archiveFile = new File("latest.tar");
try (
FileInputStream in = new FileInputStream(archiveFile);
GzipCompressorInputStream gzIn = new GzipCompressorInputStream(in);
TarArchiveInputStream tarIn = new TarArchiveInputStream(gzIn)
) {
TarArchiveEntry tarEntry = tarIn.getNextTarEntry();
while (tarEntry != null) {
final File path = new File("/tmp/" + File.separator + tarEntry.getName());
if (!path.getParentFile().exists()) {
path.getParentFile().mkdirs();
}
if (!tarEntry.isDirectory()) {
try (OutputStream out = new FileOutputStream(path)){
IOUtils.copy(tarIn, out);
}
}
tarEntry = tarIn.getNextTarEntry();
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}