how to extract a set of NetCDF values when variable lat/lon stored as matrix in R

I'm working with 3-dimensional (x,y,time) NetCDF files which contain hourly PM10 concentration estimates in a year. My aim is to extract hourly estimate of several coordinates --- so that will be 365days*24hrs=8760 estimates/year/coordinate --- then average into daily (365) estimates.

My script (see below) is working fine for year 2013, but for 2012 the output have lots of NAs. The difference I noticed was the lon/lat in 2012 files were stored in matrix form...

File E:/ENSa.2012.PM10.yearlyrea_.nc (NC_FORMAT_CLASSIC):

     3 variables (excluding dimension variables):
        float lon[x,y]   
            long_name: Longitude
            units: degrees_east
        float lat[x,y]   
            long_name: Latitude
            units: degrees_north
        float PM10[x,y,time]   
            units: ug/m3

     3 dimensions:
        x  Size:701
        y  Size:401
        time  Size:8784   *** is unlimited ***
            units: day as %Y%m%d.%f
            calendar: proleptic_gregorian

head(lon) 
      [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9]
[1,] -25.0 -25.0 -25.0 -25.0 -25.0 -25.0 -25.0 -25.0 -25.0
[2,] -24.9 -24.9 -24.9 -24.9 -24.9 -24.9 -24.9 -24.9 -24.9

for 2013 file the lon is 'normal' like this

File E:/ENSa.2013.PM25.yearlyrea.nc (NC_FORMAT_NETCDF4):

     1 variables (excluding dimension variables):
        float PM25[lon,lat,time]   (Chunking: [701,401,1])  
            long_name: PM25
            units: ug
            _FillValue: -999

     3 dimensions:
        lon  Size:701
            standard_name: longitude
            long_name: longitude
            units: degrees_east
            axis: X
        lat  Size:401
            standard_name: latitude
            long_name: latitude
            units: degrees_north
            axis: Y
        time  Size:8760   *** is unlimited ***
            standard_name: time
            long_name: time at end of period
            units: day as %Y%m%d.%f
            calendar: proleptic_gregorian

head(lon) 
[1] -25.0 -24.9 -24.8 -24.7 -24.6 -24.5

I'm using the following script:

# Command brick reads all layers (time slices) in the file
  pm102013 <- brick("ENSa.2013.PM10.yearlyrea.nc", varname = "PM10")

# Get date index from the file
  idx <- getZ(pm102013)

# Put coordinates and extract values for all time steps
  coords <- matrix(c( -2.094278,    -1.830583,  -2.584482,  -0.175269,  -3.17625,   0.54797,    -2.678731,  -1.433611,  -1.456944,  -3.182186,  
 57.15736,  52.511722,  51.462839,  51.54421,   51.48178,   51.374264,  51.638094,  53.230583,  53.231722,  55.945589),
 ncol = 2) # longitude and latitude
 vals <- extract(pm102013, coords, df=T)

# Merge dates and values and fix data frame names
 df.pm102013 <- data.frame(idx, t(vals)[-1,])
 rownames(df.pm102013) <- NULL
 names(df.pm102013) <- c('date','UKA00399', 'UKA00479', 'UKA00494', 'UKA00259', 'UKA00217', 'UKA00553', 'UKA00515', 'UKA00530', 'UKA00529', 'UKA00454')

#output
 options(max.print=100000000)
 sink("PM10_2013.txt")
 print(df.pm102013)
 sink()

Anyone knows of there's way to 'fix' the lon/lat problem? Or there's another efficient way to extract and average the data?

Solution

You can extract the nearest point to location lon/lat and make the daily average using CDO from the command line in bash:

lon=34.4
lat=22.1
cdo daymean -remapnn,lon=${lon}/lat=${lat} input.nc output_${lon}_${lat}.nc

the minus sign on the remapnn means that the result is piped into the daymean command. You can put this in a loop in bash for each of the desired points.