Search code examples
bashgreptarzcat

How to read 1TB zipped file in minimum time


I am trying to read a zipped file. I am doing this using command tar tf abc.tar.xz. Because the size of the file is 1TB so it takes a lot of time. I am not much familiar with bash script. I have other commands as well such as zcat 3532642.tar.gz | more and tar tf 3532642.tar.xz |grep --regex="folder1/folder2/folder3/folder4/" and

tar tvf 3532642.tar.xz --to-command \
'grep --label="$TAR_FILENAME" -H folder1/folder2/folder3/folder4/ ; true'

But I dont find much difference among them in terms of time they take to execute the file to read its contents.

Does anyone know how can I do It in minimum time to process such a huge amount of data for a zipped file. Any help would be appreciated!!!


Solution

  • As rrauenza mentions, since pigz may not work for the xz format, there is a similar tool pixz for parallel, indexed xz compressing/decompressing.

    from the man page it is evident that Pigz compresses/decommpresses using threads to make use of multiple processors and cores.

    Similar to pigz, this command also provides an option to specify the number of threads that can be invoked in parallel in multiple cores to achieve maximum performance.

    -p --processes n
    Allow up to n processes (default is the number of online processors)
    

    Or you can manually get the number of cores from the bash command getconf _NPROCESSORS_ONLN and set the value to -p.

    More details from the GitHub page of pixz also with details on how to download and install

    (or)

    Going with a tar only solution, it can be done only if the file-name is known in prior

    tar -zxOf <file-name_inside-tar> <file-containing-tar>
    

    with options as follow:-

       -f, --file=ARCHIVE
              use archive file or device ARCHIV
    
       -z, --gzip
              filter the archive through gzip
    
       -x, --extract, --get
              extract files from an archive
    
       -O, --to-stdout
              extract files to standard output
    

    May not be as effective as pigz, but nevertheless does the job.