I'm trying to use the erddapy package to retrieve data from the ERDDAP data servers, and this is the code that I'm trying to execute in jupyter notebook:
from erddapy import ERDDAP
import pandas as pd
import xarray as xr
import matplotlib.pyplot as plt
import numpy as np
def download_oisst_data(start_date, end_date, min_lat, max_lat, min_lon, max_lon):
e = ERDDAP(
server="https://coastwatch.pfeg.noaa.gov/erddap/",
protocol="tabledap",
)
e.dataset_id = "ncdcOisst21Agg_LonPM180" # Correctly set the dataset ID
e.variables = ["time", "latitude", "longitude", "sst"]
e.constraints = {
"time>=": start_date,
"time<=": end_date,
"latitude>=": min_lat,
"latitude<=": max_lat,
"longitude>=": min_lon,
"longitude<=": max_lon,
}
# Fetch the data and convert it to a pandas DataFrame
df = e.to_pandas(
index_col="time (UTC)",
parse_dates=True,
skiprows=(1,) # Skip the units row
).dropna()
# Convert the DataFrame to an xarray Dataset
ds = df.to_xarray()
return ds
ds = download_oisst_data(start_date, end_date, min_lat, max_lat, min_lon, max_lon)
This code returns the following error:
---------------------------------------------------------------------------
HTTPStatusError Traceback (most recent call last)
File ~/miniconda3/lib/python3.10/site-packages/erddapy/core/url.py:24, in _urlopen(url, auth, **kwargs)
23 try:
---> 24 response.raise_for_status()
25 except httpx.HTTPError as err:
File ~/miniconda3/lib/python3.10/site-packages/httpx/_models.py:761, in Response.raise_for_status(self)
760 message = message.format(self, error_type=error_type)
--> 761 raise HTTPStatusError(message, request=request, response=self)
HTTPStatusError: Client error '404 ' for url 'https://coastwatch.pfeg.noaa.gov/erddap/tabledap/ncdcOisst21Agg_LonPM180.csvp?time,latitude,longitude,sst&time%3E=368150400.0&time%3C=1704067199.0&latitude%3E=-40&latitude%3C=30&longitude%3E=30&longitude%3C=100'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/404
The above exception was the direct cause of the following exception:
HTTPError Traceback (most recent call last)
Cell In[24], line 1
----> 1 ds = download_oisst_data(start_date, end_date, min_lat, max_lat, min_lon, max_lon)
Cell In[22], line 18, in download_oisst_data(start_date, end_date, min_lat, max_lat, min_lon, max_lon)
8 e.constraints = {
9 "time>=": start_date,
10 "time<=": end_date,
(...)
14 "longitude<=": max_lon,
15 }
17 # Fetch the data and convert it to a pandas DataFrame
---> 18 df = e.to_pandas(
19 index_col="time (UTC)",
20 parse_dates=True,
21 skiprows=(1,) # Skip the units row
22 ).dropna()
24 # Convert the DataFrame to an xarray Dataset, if needed
25 # This step requires importing xarray and possibly additional processing depending on the data structure
26 ds = df.to_xarray() # Uncomment this line if you have the necessary setup for converting DataFrame to xarray Dataset
File ~/miniconda3/lib/python3.10/site-packages/erddapy/erddapy.py:361, in ERDDAP.to_pandas(self, requests_kwargs, **kw)
359 distinct = kw.pop("distinct", False)
360 url = self.get_download_url(response=response, distinct=distinct)
--> 361 return to_pandas(url, requests_kwargs=requests_kwargs, pandas_kwargs=dict(**kw))
File ~/miniconda3/lib/python3.10/site-packages/erddapy/core/interfaces.py:31, in to_pandas(url, requests_kwargs, pandas_kwargs)
19 def to_pandas(
20 url: str,
21 requests_kwargs: Optional[Dict] = None,
22 pandas_kwargs: Optional[Dict] = None,
23 ) -> "pd.DataFrame":
24 """
25 Convert a URL to Pandas DataFrame.
26
(...)
29 **pandas_kwargs: kwargs to be passed to third-party library (pandas).
30 """
---> 31 data = urlopen(url, requests_kwargs or {})
32 try:
33 return pd.read_csv(data, **(pandas_kwargs or {}))
File ~/miniconda3/lib/python3.10/site-packages/erddapy/core/url.py:42, in urlopen(url, requests_kwargs)
40 if requests_kwargs is None:
41 requests_kwargs = {}
---> 42 data = _urlopen(url, **requests_kwargs) # type: ignore
43 data.seek(0)
44 return data
File ~/miniconda3/lib/python3.10/site-packages/erddapy/core/url.py:26, in _urlopen(url, auth, **kwargs)
24 response.raise_for_status()
25 except httpx.HTTPError as err:
---> 26 raise httpx.HTTPError(f"{response.content.decode()}") from err
27 return io.BytesIO(response.content)
HTTPError: Error {
code=404;
message="Not Found: Currently unknown datasetID=ncdcOisst21Agg_LonPM180";
}
Here, the unknown datasetID error is thrown for the dataset: "ncdcOisst21Agg_LonPM180". However, upon visiting the url="https://coastwatch.pfeg.noaa.gov/erddap/", and entering "sst" as the search term, I come to find that the dataset ID indeed does exist.datasetID displayed in search results once the site is visited.
I'm using a MacBook Air 2020 edition with an Intel i5, kindly let me know what I should do to tackle this error.
You are using the wrong protocol setting for compatibility with that dataset it seems. Where it says HTTPStatusError: Client error '404 ' for url
, note the base URL you are seeing is:
https://coastwatch.pfeg.noaa.gov/erddap/tabledap/ncdcOisst21Agg_LonPM180.csvp
If you then go to https://coastwatch.pfeg.noaa.gov/erddap/tabledap/, which resolves to https://coastwatch.pfeg.noaa.gov/erddap/tabledap/index.html?page=1&itemsPerPage=1000
, and look through the list, you'll see the one you are after isn't listed among the 290 there.
If you go to the main page and look, you'll see the protocol above the one you tried is griddap
. And if you click on those datasets you can look for yours.
I see yours listed there by using the 'Advanced Search' to narrow it down. You can see it yourself here. Note the protocol entry on that search page.
So try changing the protocol
line to:
protocol="griddap",