Search code examples
pythontartarfile

How to extract a specific file from the .tar archive in python?


I have created a .tar file on a Linux machine as follows:

tar cvf test.tar test_folder/

where the test_folder contains some files as shown below:

test_folder 
|___ file1.jpg
|___ file2.jpg
|___ ...

I am unable to programmatically extract the individual files within the tar archive using Python. More specifically, I have tried the following:

import tarfile
with tarfile.open('test.tar', 'r:') as tar:
    img_file = tar.extractfile('test_folder/file1.jpg')
    # img_file contains the object: <ExFileObject name='test_folder/test.tar'>

Here, the img_file does not seem to contain the requested image, but rather it contains the source .tar file. I am not sure, where I am messing things up. Any suggestions would be really helpful. Thanks in advance.


Solution

  • Appending 2 lines to your code will solve your problem:

    import tarfile
    
    with tarfile.open('test.tar', 'r:') as tar:
        img_file = tar.extractfile('test_folder/file1.jpg')
        
        # --------------------- Add this ---------------------------
        with open ("img_file.jpg", "wb") as outfile:
            outfile.write(img_file.read())
    

    The explanation:

    The .extractfile() method only provided you the content of the extracted file (i.e. its data).

            It don't extract any file to the file system.

    So you have do it yourself - by reading this returned content (img_file.read()) and writing it into a file of your choice (outfile.write(...)).


    Or — to simplify your life — use the .extract() method instead. See my other answer.