I've created a function which download .gz files from given ftp server and I want to extract them on the fly while downloading and delete compressed files afterwards. How can I do that?
sinex_domain = "ftp://cddis.gsfc.nasa.gov/gnss/products/bias/2013"
def download(sinex_domain):
user = getpass.getuser()
sinex_parse = urlparse(sinex_domain)
sinex_connetion = FTP(sinex_parse.netloc)
sinex_connetion.login()
sinex_connetion.cwd(sinex_parse.path)
sinex_files = sinex_connetion.nlst()
sinex_userpath = "C:\\Users\\" + user + "\\DCBviz\\sinex"
pathlib.Path(sinex_userpath).mkdir(parents=True, exist_ok=True)
for fileName in sinex_files:
local_filename = os.path.join(sinex_userpath, fileName)
file = open(local_filename, 'wb')
sinex_connetion.retrbinary('RETR '+ fileName, file.write, 1024)
#want to extract files in this loop
file.close()
sinex_connetion.quit()
download(sinex_domain)
Although there is probably a cleverer way that avoids storing the whole data in memory for each file, these appear to be quite small files (a few tens of kilobytes uncompressed), so it would be sufficient to read the compressed data into a BytesIO
buffer, then decompress it in memory before writing it to the output file. (The compressed data is never saved to disk.)
You would add these imports:
import gzip
from io import BytesIO
and then your main loop becomes:
for fileName in sinex_files:
local_filename = os.path.join(sinex_userpath, fileName)
if local_filename.endswith('.gz'):
local_filename = local_filename[:-3]
data = BytesIO()
sinex_connetion.retrbinary('RETR '+ fileName, data.write, 1024)
data.seek(0)
uncompressed = gzip.decompress(data.read())
with open(local_filename, 'wb') as file:
file.write(uncompressed)
(Note that the file.close()
is not needed.)