Search code examples
rsubsetlatitude-longitudenetcdfcdo-climate

How to take a subset from a netCDF file using latitude/longitude boundaries in R


I have a netCDF file that I wish to extract a subset from defined by latitude/longitude boundaries (i.e. a lat/long defined box), using the ‘ncdf’ package in R.

A summary of my netCDF file is below. It has two dimensions (latitude and longitude) and 1 variable (10U_GDS4_SFC). It is essentially a lat/long grid containing wind values:

[1] "file example.nc has 2 dimensions:"
[1] "lat_0   Size: 1280"
[1] "lon_1   Size: 2560"
[1] "------------------------"
[1] "file example.nc has 1 variables:"
[1] "float 10U_GDS4_SFC[lon_1,lat_0]  Longname:10 metre U wind component Missval:1e+30"

The latitude variable runs from +90 to -90 and the longitude variable runs form 0 to 360.

I wish to extract a subset of the overall grid using the following geographical corner boundaries:

bottom left corner: Lat: 34.5˚, Long: 355˚, top left corner: Lat: 44.5˚, Long: 355˚, top right corner: Lat: 44.5˚, Long: 12˚, bottom right corner: Lat: 34.5˚ , Long: 12˚

I am aware that parts of a variable can be extracted using the get.var.ncdf() command (example below):

z1 = get.var.ncdf(example.nc, "10U_GDS4_SFC", start=c(11,26), count=c(5,5))

However, I can’t work out how lat/long can be incorporated so that I end up with a subsetted spatial grid containing variable values. I am new to working with netCDF values in R, and any advice would be greatly appreciated. Many thanks!


Solution

  • In principle you are 2/3 of the way there. You can of course create the starting indices using something like this:

    require(ncdf4)
    
    ncFile <- nc_open( MyNetCDF )
    LonStartIdx <- which( ncFile$dim$lon$vals == 355)
    LatStartIdx <- which( ncFile$dim$lat$vals == 34.5)
    

    Do the same for the counts. Then, read the variable you want

    MyVariable <- ncvar_get( ncFile, varName, start=c( LonStartIdx, LatStartIdx), count=...)
    

    However in your case you are out of luck as far as I know. The reading / writing netcdf routines do their stuff sequentially. Your grid wraps around since you have coordinates that go from 0 - 360 in longitude and you are interested in a box that contains the zero meridian.

    For you (assuming you have not too much data) it would make more sense to read in the full grid into R, and then use either subset or create indices using which and cut out your "box" in R.

    ncFile <- nc_open( MyNetCDF )
    LonIdx <- which( ncFile$dim$lon$vals > 355 | ncFile$dim$lon$vals < 10)
    LatIdx <- which( ncFile$dim$lat$vals > 34.5 & ncFile$dim$lat$vals < 44.5)
    MyVariable <- ncvar_get( ncFile, varName)[ LonIdx, LatIdx]
    nc_close(ncFile)
    

    Remark: I prefer ncdf4, I find the syntax a bit easier to remember (and there was another advantage over the older netcdf R-package that I have forgotten...)

    Ok. Comments cannot be as long as I would need them, so I updated the answer No worries. Let's go through the questions step by step.

    • The which function way will work. I use it myself.
    • The data will be in a similar format as in the netcf file, but I am not too sure if there is some problem with the 0 meridian (I guess yes). You might have to swap the two halves by doing something like this (replace the corresponding line in the 2nd example)

      LonIdx <- c(which( ncFile$dim$lon$vals > 355) , which( ncFile$dim$lon$vals < 10) )
      

      This changes the order of the coordinate indices so that the Western part comes first and then the Eastern.

    • Reformatting everything to a 2x3 data frame is possible. Take the data my 2nd code example returns (will be a matrix, [lon x lat]. Also get the values of the coordinates from

      lon <- ncFile$dim$lon$val[LonIdx]
      

      (or how longitude is called in your example, same for lat). Then assemble the matrix using

      cbind( rep(lat, each=length(lon)), rep(lon,length(lat)), c(myVariable) )
      
    • The coordinates will of course be the same as in the netcdf file...

    You need to samity check the last cbind, as I am only about 98% confident that I have not messed up the coordinates. In the R scripts I found on my desktop I use loops, which are... evil... This should be (a bit?) faster and is also more sensible.