I have a netcdf file (global domain) which I do not know its projection information, and I want to extract the data (with lon and lat) based on the shapefile.
The objective is to find data for the domain represented by the shapefile ( the original netcdf contains global data). Besides, the final data format should be a data frame, which contains lon, lat, and data. I assume that the extract
and mask
function will be useful.
The netcdf and shapefile can be found at https://www.dropbox.com/s/ju96eitzzd0xxg8/precip.2000.nc?dl=0 and https://www.dropbox.com/s/8wfgf8207dbh79r/gpr_000b11a_e.zip?dl=0. Thanks for any help.
library(rgdal)
library(ncdf4)
library(raster)
shp = readOGR("gpr_000b11a_e.shp")
pre1 = nc_open("precip.2000.nc")
pre1
File precip.2000.nc (NC_FORMAT_NETCDF4_CLASSIC):
1 variables (excluding dimension variables):
float precip[lon,lat,time]
missing_value: -9.96920996838687e+36
var_desc: Precipitation
level_desc: Surface
statistic: Total
parent_stat: Other
long_name: Daily total of precipitation
cell_methods: time: sum
avg_period: 0000-00-01 00:00:00
actual_range: 0
actual_range: 482.555358886719
units: mm
valid_range: 0
valid_range: 1000
dataset: CPC Global Precipitation
3 dimensions:
lat Size:360
actual_range: 89.75
actual_range: -89.75
long_name: Latitude
units: degrees_north
axis: Y
standard_name: latitude
coordinate_defines: center
lon Size:720
long_name: Longitude
units: degrees_east
axis: X
standard_name: longitude
actual_range: 0.25
actual_range: 359.75
coordinate_defines: center
time Size:366 *** is unlimited ***
long_name: Time
axis: T
standard_name: time
coordinate_defines: start
actual_range: 876576
actual_range: 885336
delta_t: 0000-00-01 00:00:00
avg_period: 0000-00-01 00:00:00
units: hours since 1900-01-01 00:00:00
7 global attributes:
Conventions: CF-1.0
version: V1.0
history: created 9/2016 by CAS NOAA/ESRL PSD
title: CPC GLOBAL PRCP V1.0
References: https://www.esrl.noaa.gov/psd/data/gridded/data.cpc.globalprecip.html
dataset_title: CPC GLOBAL PRCP V1.0
Source: ftp://ftp.cpc.ncep.noaa.gov/precip/CPC_UNI_PRCP/
As we can see, there is no information about the projection or crs for the netcdf file.
You need to open the NetCDF file as a raster* object to process as a raster it in R. Use brick
or stack
for this, rather than nc_open
pre1.brick = brick("precip.2000.nc")
You will notice that this file uses the normal convention in climate science of giving longitudes in values ranging from 0 to 360 degrees:
extent(pre1.brick)
# class : Extent
# xmin : 0
# xmax : 360
# ymin : -90
# ymax : 90
We can convert to conventional -180 to 180 longitudes usint rotate
pre1.brick = rotate(pre1.brick)
Now lets see what the projections of our raster and polygon files are:
crs(shp)
# +proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0
crs(pre1.brick)
# +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
You can see that both use lat-long coordinates, but differ in terms of datum and ellipse. It is usually recommended for accuracy and speed, when possible, to project the vector data rather than the raster to get them both in the same coordinate system:
shp = spTransform(shp, crs(pre1.brick))
Having projected both to the same coord system, we can apply the shapefile as a mask to the raster brick:
pre1.mask = mask(pre1.brick, shp)
And convert to a data.frame. Here I just demonstrate for the first layer. You can do all layers at once if you prefer, by removing [[1]]
in the following line
pre1.df = as.data.frame(pre1.mask[[1]], xy=TRUE)
If you want, you can remove lines that have NA, to only leave cells inside the mask containing data:
pre1.df = pre1.df[complete.cases(pre1.df),]
head(pre1.df)
# x y X2000.01.01.00.00.00
# 10278 -81.25 82.75 0.2647110
# 10279 -80.75 82.75 0.2721323
# 10280 -80.25 82.75 0.2797630
# 10281 -79.75 82.75 0.2875668
# 10282 -79.25 82.75 0.2925712
# 10283 -78.75 82.75 0.2995636