I'm working with 3-dimensional (x,y,time) NetCDF files which contain hourly PM10 concentration estimates in a year. My aim is to extract hourly estimate of several coordinates --- so that will be 365days*24hrs=8760 estimates/year/coordinate --- then average into daily (365) estimates.
My script (see below) is working fine for year 2013, but for 2012 the output have lots of NAs. The difference I noticed was the lon/lat in 2012 files were stored in matrix form...
File E:/ENSa.2012.PM10.yearlyrea_.nc (NC_FORMAT_CLASSIC):
3 variables (excluding dimension variables):
float lon[x,y]
long_name: Longitude
units: degrees_east
float lat[x,y]
long_name: Latitude
units: degrees_north
float PM10[x,y,time]
units: ug/m3
3 dimensions:
x Size:701
y Size:401
time Size:8784 *** is unlimited ***
units: day as %Y%m%d.%f
calendar: proleptic_gregorian
head(lon)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] -25.0 -25.0 -25.0 -25.0 -25.0 -25.0 -25.0 -25.0 -25.0
[2,] -24.9 -24.9 -24.9 -24.9 -24.9 -24.9 -24.9 -24.9 -24.9
for 2013 file the lon is 'normal' like this
File E:/ENSa.2013.PM25.yearlyrea.nc (NC_FORMAT_NETCDF4):
1 variables (excluding dimension variables):
float PM25[lon,lat,time] (Chunking: [701,401,1])
long_name: PM25
units: ug
_FillValue: -999
3 dimensions:
lon Size:701
standard_name: longitude
long_name: longitude
units: degrees_east
axis: X
lat Size:401
standard_name: latitude
long_name: latitude
units: degrees_north
axis: Y
time Size:8760 *** is unlimited ***
standard_name: time
long_name: time at end of period
units: day as %Y%m%d.%f
calendar: proleptic_gregorian
head(lon)
[1] -25.0 -24.9 -24.8 -24.7 -24.6 -24.5
I'm using the following script:
# Command brick reads all layers (time slices) in the file
pm102013 <- brick("ENSa.2013.PM10.yearlyrea.nc", varname = "PM10")
# Get date index from the file
idx <- getZ(pm102013)
# Put coordinates and extract values for all time steps
coords <- matrix(c( -2.094278, -1.830583, -2.584482, -0.175269, -3.17625, 0.54797, -2.678731, -1.433611, -1.456944, -3.182186,
57.15736, 52.511722, 51.462839, 51.54421, 51.48178, 51.374264, 51.638094, 53.230583, 53.231722, 55.945589),
ncol = 2) # longitude and latitude
vals <- extract(pm102013, coords, df=T)
# Merge dates and values and fix data frame names
df.pm102013 <- data.frame(idx, t(vals)[-1,])
rownames(df.pm102013) <- NULL
names(df.pm102013) <- c('date','UKA00399', 'UKA00479', 'UKA00494', 'UKA00259', 'UKA00217', 'UKA00553', 'UKA00515', 'UKA00530', 'UKA00529', 'UKA00454')
#output
options(max.print=100000000)
sink("PM10_2013.txt")
print(df.pm102013)
sink()
Anyone knows of there's way to 'fix' the lon/lat problem? Or there's another efficient way to extract and average the data?
You can extract the nearest point to location lon/lat and make the daily average using CDO from the command line in bash:
lon=34.4
lat=22.1
cdo daymean -remapnn,lon=${lon}/lat=${lat} input.nc output_${lon}_${lat}.nc
the minus sign on the remapnn means that the result is piped into the daymean command. You can put this in a loop in bash for each of the desired points.