I need to download public bathymetry data from NOAA NCEI. Often, this means I need to download hundreds of small files to later be stitched together. NOAA NCEI has a tool for this -- "request files": click request, enter your email, and wait for them to send you a zipped folder of all the files. This can take >1 week, and sometimes requests fail without your knowledge. I would like to avoid that method. Below is an example data source:
Ultimately, I would like to use wget/curl to download every .gz file from an NCEI page such as that. I noticed that on the page all the file links are present. You can right click->open in new window->enter to download a single file immediately. If you do this, it redirects to a link like this:
How can I use a command line tool like wget to download all .gz files from a page like this?
I have tried commands such as:
wget --execute="robots = off" -A.gz --mirror --convert-links --no-parent [url]
but I get one of two errors for the .gz files:
"Unable to establish SSL connection."
"HTTP request sent, awaiting response... 301 Moved Permanently"
You are accessing https://www.ngdc.noaa.gov/ships/nautilus/NA072_mb.html but links are like http://data.ngdc.noaa.gov/platforms/ocean/ships/nautilus/NA072/multibeam/data/version2/MB/em302/0000_20160601_180321_Nautilus_EM302.gsf.mb121.gz so they are different hosts, by default wget
does not crawl to other hosts, you should combine --recursive
with -H
to do so, consult Spanning Hosts for more information.
I suggest using following command
wget --recursive --level=1 -H --accept gz 'https://www.ngdc.noaa.gov/ships/nautilus/NA072_mb.html'
beyond mentioned --recursive
combination I limited level (depth) to 1, meaning that I want only links present at given page (not links in linked page &c) and to gz
files. Please try running that command and write if it does downloaded desired files.