I'm learning Python threading and in the same time trying to improve my old untaring script.
The main part of it looks like:
import tarfile, os, threading
def untar(fname, path):
print "Untarring " + fname
try:
ut = tarfile.open(os.path.join(path,fname), "r:gz")
ut.extractall(path)
ut.close()
except tarfile.ReadError as e: #in case it's not gziped
print e
ut = tarfile.open(os.path.join(path,fname), "r:*")
ut.extractall(path)
ut.close()
def untarFolder(path):
if path == ".":
path = os.getcwd()
print "path", path
ListTarFiles = serveMenu(path) # function what parse folder
# content for tars, and tar.gz
# files and return list of them
print "ListTarFiles ", ListTarFiles
for filename in ListTarFiles:
print "filename: ", filename
t = threading.Thread(target=untar, args = (filename,path))
t.daemon = True
t.start()
print "Thread:", t
So the goal to untar all files in given folder not one by one but in parallel mode at the same time. Is it possible?
Output:
bogard@testlab:~/Toolz/untar$ python untar01.py -f .
path /home/bogard/Toolz/untar
ListTarFiles ['tar1.tgz', 'tar2.tgz', 'tar3.tgz']
filename: tar1.tgz
Untarring tar1.tgz
Thread: <Thread(Thread-1, started daemon 140042104731392)>
filename: tar2.tgz
Untarring tar2.tgz
Thread: <Thread(Thread-2, started daemon 140042096338688)>
filename: tar3.tgz
Untarring tar3.tgz
Thread: <Thread(Thread-3, started daemon 140042087945984)>
In output can see that script create threads but it doesn't untar any files. What's the catch?
What might be happening is that your script is returning before the threads actually complete. You can wait for a thread to complete with Thread.join()
. Maybe try something like this:
threads = []
for filename in ListTarFiles:
t = threading.Thread(target=untar, args = (filename,path))
t.daemon = True
threads.append(t)
t.start()
# Wait for each thread to complete
for thread in threads:
thread.join()
Also, depending on the number of files you're untarring, you might want to limit the number of jobs that you're launching, so that you're not trying to untar 1000 files at once. You could maybe do this with something like multiprocessing.Pool
.