I tried the different way to zip folder.My understanding is that Python built-in module always faster than subprocess.call("Linux command"). But I just did some demo. The tarfile module is slow than subprocess.call("tar").Can someone explain it to me?
#!/usr/bin/python
import os
import time
import tarfile
import subprocess
tStart1 = time.time()
TestFolder = ["Jack", "Robin"]
for folder in TestFolder:
name = "/mnt/ShareDrive/Share/ExistingUsers/"+folder
path = "/mnt/TEST2/"
tar = tarfile.open(path+folder+".tar.gz", "w:gz")
tar.add(name)
tar.close()
tEnd1 = time.time()
time.sleep(2)
tStart2 = time.time()
for folder in TestFolder:
path = "/mnt/TEST1/"
subprocess.call(["tar", "zcvf", path+folder+".tar.gz", "-P", "/mnt/ShareDrive/Share/ExistingUsers/"+folder])
tEnd2 = time.time()
print "The module cost %f sec" % (tEnd1 - tStart1)
print "The subprocess cost %f sec" % (tEnd2 - tStart2)
The tarfile module cost 63 sec. The subprocess cost only 32 sec.
The total size of two folders is 433 MB
tar
is written in C. The tarfile
module is a pure Python implementation of tar handling. There is no way that the module will be faster than the command.