Search code examples
pythonnetcdfnetcdf4

Open netcdfs data in gz file


I have netcdfs saved in a gz file and I am trying to import as a geodataframe on python. I don't know the name of the variables in the netcdfs.

My code:


gzipped_file_path = 'Maize_1970_Yield_ver12b_BRA.nc.gz'


with gzip.open(gzipped_file_path, 'rb') as f:
    # Read the content of the gzipped file
    content = f.read()

This part works fine, but then, when trying to create a dataset, I'm trying:

df=nc.Dataset(content)

And it starts to run forever (it`s been running for over 3 hours as of now). What is wrong with this code?


Solution

  • Okay, so the nc.Database function expects to get a file name to an opened file, or a File ID.

    So first, lets get the file opening ready:

    import gzip, os
    import netCDF4 as nc
    
    gzipped_file_path = 'Maize_1970_Yield_ver12b_BRA.nc.gz'
    temp_nc_path = 'temp_netcdf_file.nc'
    
    with gzip.open(gzipped_file_path, 'rb') as f_in, open(temp_nc_path, 'wb') as f_out:
        f_out.write(f_in.read())
    

    Now, f_out is an opened file that essentially contains the contents of f_in, and you can work with the nc.Database function:

    ds = nc.Dataset(temp_nc_path)
    print(ds.variables.keys()) # check the keys
    

    Finally, close the file and delete the temporary file in order to avoid garbage in your system and having to do maintenance on tmp folders later on:

    ds.close()
    os.remove(temp_nc_path)
    

    That should do it.