Search code examples
pythonfor-loopweb-scrapinggunzip

Gunzip all the files present in source directory in Python


I have written a code to gunzip all the files present in the source folder. But I want to include the check that if gunzipped file doesn't exist then gunzip it else move to next file.

source_dir = "/Users/path"
dest_dir = "/Users/path/Documents/path"


for src_name in glob.glob(os.path.join(source_dir, '*.gz')):

    base = os.path.basename(src_name)
    dest_name = os.path.join(dest_dir, base[:-3])
    with: gzip.open(src_name, 'rb') as infile, open(dest_name, 'wb') as outfile:
            try:
                for line in infile:
                    print ("outfile: %s" %outfile)
                    if not os.path.exists(dest_name):
                      outfile.write(line)
                      print( "converted: %s" %dest_name) 

            except EOFError:
                print("End of file error occurred.")

            except Exception:
                print("Some error occurred.")

I have used os.path.exist to check whether the file exists or not, but it seems like os.path.exist doesn't work here.


Solution

  • I think you have misplaced the path.exists call. It should be:

    source_dir = "/Users/path"
    dest_dir = "/Users/path/Documents/path"
    
    
    for src_name in glob.glob(os.path.join(source_dir, '*.gz')):
    
        base = os.path.basename(src_name)
        dest_name = os.path.join(dest_dir, base[:-3])
    
        if not os.path.exists(dest_name):
            with gzip.open(src_name, 'rb') as infile, open(dest_name, 'wb') as outfile:
                try:
                    for line in infile:
                        print("outfile: %s" % outfile)
                        outfile.write(line)
                        print("converted: %s" % dest_name)
    
                except EOFError:
                    print("End of file error occurred.")
    
                except Exception:
                    print("Some error occurred.")
    

    Also as @MadPhysicist emphasized: "doing the check after open(..., 'wb') (as you did in your original code), will always say that the file exists because that is what open(..., 'w') does"

    On top of that even if you made some other check for the necessity of gunzipping, doing it where you've put it will do the check on every line, which is completely redundant as the result will be the same for all lines (exists/not-exists).