Search code examples
pythonurllibnetcdf

Best function for pulling multiple NetCDF files from server, indexing, looping, and saving on file?


Pretty new to programming. I am trying to alter a script that was meant to pull .txt files containing data to now pull NetCDF files from an HTTP server, download, rename, and save locally (well another server location). I've pasted the base code including actual buoy data file names for NetCDF files. I believe there is an issue at the urlrequest line. I've tried urllib.request.open and url.request.retrieve and both give errors.

    import os
    import urllib
    import urllib.request
    import shutil
    import netCDF4
    import requests
           
    # Weblink for location of spectra and wave data
    webSpectra = 'https://dods.ndbc.noaa.gov/thredds/fileServer/data/swden/41004/41004w9999.nc'
    
    webWave = 'https://dods.ndbc.noaa.gov/thredds/fileServer/data/stdmet/41004/41004h9999.nc'
       
    #set save location for each
    saveloc = 'saveSpectra41004w9999.nc'
    saveloc2 = 'saveWave41004h9999.nc'
    
    # perform pull
    try:
            urllib.request.urlopen(webSpectra, saveloc)
        except urllib.error.HTTPError as exception:
            print('Station: 41004 spectra file not available')
            print(exception)
        
        try:     
            urllib.request.urlopen(webWave, saveloc2)    
        except urllib.error.HTTPError as exception:
            print('Station: 41004 wave file not available')
            print(exception)
        print ('Pulling data for 41004)
        print('Percent complete '+ str(round(100*(count/len(stationIndex)))))

    print('Done')

My errors

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-5e5ebd26fe46> in <module>
     59     # perform pull
     60     try:
---> 61         urllib.request.urlopen(webSpectra, saveloc)
     62     except urllib.error.HTTPError as exception:
     63         print('Station: 41004 spectra file not available')

/work/anaconda3/envs/aoes/lib/python3.6/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    221     else:
    222         opener = _opener
--> 223     return opener.open(url, data, timeout)
    224 
    225 def install_opener(opener):

/work/anaconda3/envs/aoes/lib/python3.6/urllib/request.py in open(self, fullurl, data, timeout)
    522         for processor in self.process_request.get(protocol, []):
    523             meth = getattr(processor, meth_name)
--> 524             req = meth(req)
    525 
    526         response = self._open(req, data)

/work/anaconda3/envs/aoes/lib/python3.6/urllib/request.py in do_request_(self, request)
   1277                 msg = "POST data should be bytes, an iterable of bytes, " \
   1278                       "or a file object. It cannot be of type str."
-> 1279                 raise TypeError(msg)
   1280             if not request.has_header('Content-type'):
   1281                 request.add_unredirected_header(

TypeError: POST data should be bytes, an iterable of bytes, or a file object. It cannot be of type str.

Solution

  • You just want to download the files by the looks of it. You can do this using nctoolkit (https://nctoolkit.readthedocs.io/en/latest/). This will download the files to a temporary location. You can then export to xarray or pandas etc., or just save the file.

    Code below will work for one file:

    import nctoolkit as nc
    ds = nc.open_url('https://dods.ndbc.noaa.gov/thredds/fileServer/data/stdmet/41004/41004h9999.nc')
    # convert to xarray dataset
    ds_xr = ds.to_xarray()
    # convert to pandas dataframe
    df = ds.to_dataframe()
    # save to location
    ds.to_nc("outfile.nc")
    

    If the above does not work due to dependency issues etc., you can just use urllib:

    import urllib.request
    url = 'https://dods.ndbc.noaa.gov/thredds/fileServer/data/stdmet/41004/41004h9999.nc'
    urllib.request.urlretrieve(url, '/tmp/temp/nc')