Search code examples
pythonftpgzipftplib

extracting compressed files from .gz extension while downloading them from ftp server


I've created a function which download .gz files from given ftp server and I want to extract them on the fly while downloading and delete compressed files afterwards. How can I do that?

sinex_domain = "ftp://cddis.gsfc.nasa.gov/gnss/products/bias/2013"

def download(sinex_domain):
    user = getpass.getuser()
    sinex_parse = urlparse(sinex_domain)

    sinex_connetion = FTP(sinex_parse.netloc)
    sinex_connetion.login()
    sinex_connetion.cwd(sinex_parse.path)
    sinex_files = sinex_connetion.nlst()
    sinex_userpath = "C:\\Users\\" + user + "\\DCBviz\\sinex"
    pathlib.Path(sinex_userpath).mkdir(parents=True, exist_ok=True)

    for fileName in sinex_files:
        local_filename = os.path.join(sinex_userpath, fileName)
        file = open(local_filename, 'wb')
        sinex_connetion.retrbinary('RETR '+ fileName, file.write, 1024)
        
        #want to extract files in this loop

        file.close()

    sinex_connetion.quit()

download(sinex_domain)

Solution

  • Although there is probably a cleverer way that avoids storing the whole data in memory for each file, these appear to be quite small files (a few tens of kilobytes uncompressed), so it would be sufficient to read the compressed data into a BytesIO buffer, then decompress it in memory before writing it to the output file. (The compressed data is never saved to disk.)

    You would add these imports:

    import gzip
    from io import BytesIO
    

    and then your main loop becomes:

        for fileName in sinex_files:
            local_filename = os.path.join(sinex_userpath, fileName)
            if local_filename.endswith('.gz'):
                local_filename = local_filename[:-3]
            data = BytesIO()
            sinex_connetion.retrbinary('RETR '+ fileName, data.write, 1024)
            data.seek(0)
            uncompressed = gzip.decompress(data.read())
            with open(local_filename, 'wb') as file:
                file.write(uncompressed)
    

    (Note that the file.close() is not needed.)