Search code examples
pythonnetcdfopendapthredds

How to download and subset netCDF files from NCEI THREDDS server


I am trying to download and subset the files located here: https://www.ncei.noaa.gov/thredds-ocean/catalog/ncei/archive/data/0129374/daily/catalog.html, but I'm not sure if I'm doing something wrong or if there is something wrong with the link. This is my first time downloading data from this service so I can't exactly tell.

If I hover over the link for the first file I see: https://www.ncei.noaa.gov/thredds-ocean/catalog/ncei/archive/data/0129374/daily/catalog.html?dataset=ncei/archive/data/0129374/daily/livneh_NAmerExt_15Oct2014.195001.nc

I've tried opening this url using pydap:

from pydap.client import open_url

open_url('https://www.ncei.noaa.gov/thredds-ocean/catalog/ncei/archive/data/0129374/daily/catalog.html?dataset=ncei/archive/data/0129374/daily/livneh_NAmerExt_15Oct2014.195001.nc')

But I get the error:

webob.exc.HTTPError: 404 Not Found

If I use netCDF4 library, I get a different error:

import netCDF4

netCDF4.Dataset('https://www.ncei.noaa.gov/thredds-ocean/catalog/ncei/archive/data/0129374/daily/livneh_NAmerExt_15Oct2014.195001.nc')

which gives me:

OSError: [Errno -75] NetCDF: Malformed or unexpected Constraint: b'https://www.ncei.noaa.gov/thredds-ocean/catalog/ncei/archive/data/0129374/daily/catalog.html?dataset=ncei/archive/data/0129374/daily/livneh_NAmerExt_15Oct2014.195001.nc'

Is it possible that there is something wrong with the link? How can I download and subset this data?


Solution

  • At present you are using the wrong the file. You need to use the OPENDAP link: https://www.ncei.noaa.gov/thredds-ocean/dodsC/ncei/archive/data/0129374/daily/livneh_NAmerExt_15Oct2014.195001.nc.html. And then remove the html part of it.

    I have tested this using my nctoolkit package and it seems to work fine:

    import nctoolkit as nc
    ds = nc.open_thredds("https://www.ncei.noaa.gov/thredds-ocean/dodsC/ncei/archive/data/0129374/daily/livneh_NAmerExt_15Oct2014.195001.nc")
    ds.select(time = 0)
    ds.plot()