Search code examples
pythoncentos

Python tarfile slow than Linux command


I tried the different way to zip folder.My understanding is that Python built-in module always faster than subprocess.call("Linux command"). But I just did some demo. The tarfile module is slow than subprocess.call("tar").Can someone explain it to me?

    #!/usr/bin/python

import os
import time
import tarfile
import subprocess

tStart1 = time.time()

TestFolder = ["Jack", "Robin"]
for folder in TestFolder:
    name = "/mnt/ShareDrive/Share/ExistingUsers/"+folder
    path = "/mnt/TEST2/"
    tar = tarfile.open(path+folder+".tar.gz", "w:gz")
    tar.add(name)
    tar.close()
tEnd1 = time.time()

time.sleep(2)

tStart2 = time.time()
for folder in TestFolder:
    path = "/mnt/TEST1/"
    subprocess.call(["tar", "zcvf", path+folder+".tar.gz", "-P", "/mnt/ShareDrive/Share/ExistingUsers/"+folder])
tEnd2 = time.time()

print "The module cost %f sec" % (tEnd1 - tStart1)
print "The subprocess cost %f sec" % (tEnd2 - tStart2)

The tarfile module cost 63 sec. The subprocess cost only 32 sec.

The total size of two folders is 433 MB


Solution

  • tar is written in C. The tarfile module is a pure Python implementation of tar handling. There is no way that the module will be faster than the command.