Search code examples
pythontarfile

How to open a tarfile and get the data that's inside one of its files?


This is the code I tried:

import tarfile

# Opening zipped tarfile archive
t = tarfile.open(r'C:\Users\Luke\Desktop\my data.gz', "r:gz") 

t.getmembers() #Showing members within tarfile archive

It prints this:

TarInfo './._SA00000' at 0x2a9431ea430,
TarInfo 'SA00000' at 0x2a9431ea5c0,   #theres more members didn't want to show them all

I tried:

x = t.extract('SA00000')

print(x)

It prints None.

I really don't understand. I've opened up the tarfile in notepad and all the data is there.

Don't know if this helps but I'm using python 3 on windows 10, and the data was given to me from a MacOS.


Solution

  • Tar files are just linux grouping of files, no real compression. tar.gz is the same thing but with compression. Whether you have a tar or tar.gz file you can view the internals simply by opening the file in compression software (7zip recommended).

    If your looking to do it programmatically, I would still recommend using 7zip, but in a standalone CLI version called 7za.exe (7-Zip Extra: standalone console version, 7z DLL, Plugin for Far Manager).

    This will provide you with all the functionality of 7zip without the GUI bloat. You will be able to call it from your application; it looks like your using python so you would likely use the subprocess module. Here is an example:

    def decompress():
         out=subprocess.call("7za.exe e -y -o'OUTPUT_FILE_NAME_1.tar' INPUT_FILENAME_1.tar.gz",stdout=fout,stderr=ferr)
            output=fout.read())  
            errors = ferr.read()
            if !errors:
                out=subprocess.call("7za.exe x -y -o'OUTPUT_FILE_NAME_2' OUTPUT_FILE_NAME_1.tar ",stdout=fout,stderr=ferr)
                output=fout.read())  
                errors = ferr.read() 
    

    You will end up with a folder or file called OUTPUT_FILE_NAME_2 which you can then walk as if the tar/tar.gz file never existed. The first call uncompresses the "tarball" and the second call extracts it.

    7za.exe reference