Search code examples
pythonunzip

unzip operation taking several hours


I am using the following shell script to loop over 90 zip files & unarchive them on a Linux box hosted with Hostinger (Shared web hosting)

#!/bin/bash

SOURCE_DIR="<path_to_archives>"

cd ${SOURCE_DIR}

for f in *.zip
do
#   unzip -oqq "$f" -d "${f%.zip}" &
    python3 scripts/extract_archives.py "${f}" &
done
wait

The python script being called by the above shell script is below -

import shutil
import sys

source_path = "<path to source dir>"

def extract_files(in_file):
    shutil.unpack_archive(source_path + in_file, source_path + in_file.split('.')[0])
    print('Extracted : ', in_file)


extract_files(sys.argv[1].strip())

Irrespective of whether I use the inbuilt unzip command or a python, it's taking about 2.5 hours to unzip all the files. unarchiving all the zip files results 90 folders with 170000 files overall. I would've thought anywhere between 15/20 min is reasonably acceptable timeframe.

I've tried a few different variations in that, I have tried just tarring the folders instead of zipping them up thinking just un-tarring may be faster than unzipping. I've used tar command from source server to transfer the files over ssh & untar in memory something like this -

time tar zcf - . | ssh -p <port> user@host "tar xzf - -C <dest dir>"

Nothing is helping. I am open to using any other programming language like Perl, Go or others too if necessary to speed things up.

Please can someone help me solve this performance problem.


Solution

  • Thank you everyone for your answers. As you indicated, this was to do with throttling on the servers in a hosted environment