Search code examples
pythonunixrar

Untar file in Python script with wildcard


I am trying in a Python script to import a tar.gz file from HDFS and then untar it. The file comes as follow 20160822073413-EoRcGvXMDIB5SVenEyD4pOEADPVPhPsg.tar.gz, it has always the same structure.

In my python script, I would like to copy it locally and the extract the file. I am using the following command to do this:

import subprocess
import os
import datetime
import time

today = time.strftime("%Y%m%d")

#Copy tar file from HDFS to local server
args = ["hadoop","fs","-copyToLocal", "/locationfile/" + today + "*"]

p=subprocess.Popen(args)

p.wait()

#Untar the CSV file 
args = ["tar","-xzvf",today + "*"]

p=subprocess.Popen(args)

p.wait()

The import works perfectly but I am not able to extract the file, I am getting the following error:

['tar', '-xzvf', '20160822*.tar']
tar (child): 20160822*.tar: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now
put: `reportResults.csv': No such file or directory

Can anyone help me?

Thanks a lot!


Solution

  • I found a way to do what I needed, instead of using os command, I used python tar command and it works!

    import tarfile
    import glob
    
    os.chdir("/folder_to_scan/")
    for file in glob.glob("*.tar.gz"):
        print(file)
    
    tar = tarfile.open(file)
    tar.extractall()
    

    Hope this help.

    Regards Majid