Search code examples
archivetar

bash: extract only part of tar.gz archive


I have a very large .tar.gz file which I can't extract all together because of lack of space. I would like to extract half of its contents, process them, and then extract the remaining half.

The archive contains several subdirectories, which in turn contain files. When I extract a subdirectory, I need all its contents to be extracted with it.

What's the best way of doing this in bash? Does tar already allow this?


Solution

  • OK, so based on this answer, I can list all contents at the desired depth. In my case, the tar.gz file is structured as follows:

    archive.tar.gz:
    archive/
    archive/a/
    archive/a/file1
    archive/a/file2
    archive/a/file3
    archive/b/
    archive/b/file4
    archive/b/file5
    archive/c/
    archive/c/file6
    

    So I want to loop over subdirectories a, b, c and, for instance extract the first two of them:

    parent_folder='archive/'
    max_num=2
    counter=0
    mkdir $parent_folder
    for subdir in `tar --exclude="*/*/*" -tf archive.tar.gz`; do
        if [ "$subdir" = "$parent_folder" ];
        then
            echo 'not this one'
            continue        
        fi
        if [ "$counter" -lt "$max_num" ];
        then
            tar zxvf archive.tar.gz $subdir -C ./${parentfolder}${subdir}
            counter=$((counter + 1))
        fi
    done
    

    Next, for the remaining files:

    max_num=2
    counter=0
    mkdir $parent_folder
    for subdir in `tar --exclude="*/*/*" -tf files.tar.gz`; do
        if [ "$subdir" = "$parent_folder" ];
        then
            echo 'not this one'
            continue        
        fi
        if [ "$counter" -ge "$max_num" ];
        then
            tar zxvf files.tar.gz $subdir -C ./${parent_folder}$subdir
        fi
        counter=$((counter + 1))
    done