Search code examples
pythontar

Extract only a single directory from tar (in python)


I am working on a project in python in which I need to extract only a subfolder of tar archive not all the files. I tried to use

tar = tarfile.open(tarfile)
tar.extract("dirname", targetdir)

But this does not work, it does not extract the given subdirectory also no exception is thrown. I am a beginner in python. Also if the above function doesn't work for directories whats the difference between this command and tar.extractfile() ?


Solution

  • Building on the second example from the tarfile module documentation, you could extract the contained sub-folder and all of its contents with something like this:

    with tarfile.open("sample.tar") as tar:
        subdir_and_files = [
            tarinfo for tarinfo in tar.getmembers()
            if tarinfo.name.startswith("subfolder/")
        ]
        tar.extractall(members=subdir_and_files)
    

    This creates a list of the subfolder and its contents, and then uses the recommended extractall() method to extract just them. Of course, replace "subfolder/" with the actual path (relative to the root of the tar file) of the sub-folder you want to extract.