Search code examples
pythongziptarfile

gzip.open("file.tar.gz", "rb") vs. tarfile.open("file.tar.gz"); extractall()


Assuming I have one file - 'file.txt' tarred and gzipped, what is the difference between:

    with tarfile.open('file.tar.gz') as tar:
        tar.extractall()
        with open('file.txt', 'rb') as f:
            x =  f.read()

and

    with gzip.open('file.tar.gz', 'rb') as f:
        x =  f.read()

In the 1st I get the output with no strange hex characters, that do appear in the 2nd. Does the f.read inside the gzip reads the actual .tar file instead of reading the plain file and those characters are the tar-files headers?


Solution

  • Correct assumption. Tar simply glues files together without compression, while gzip can only compress single files. You‘re reading the tar after decompression with gzip.open