Search code examples
rarraysgisnetcdf4

Why manipulated netCDF file in R environment produces rasters with a different resolution compared to the ones created in ArcGIS?


I'm dealing with a netcdf4 file downloaded from the Marine Environment Monitoring Service of Copernicus and I think I miss something here...

Consider my approach in the following code to extract values from the nc file. My scope is to reanalyze the data stored in the time series with a linear regression in order to get the mean SST change per day per pixel (not quite sure if I'm getting the statistic I want here! I'm open to suggestions! anyway this is not the point of my thread).

library(ncdf4)
library(raster)
nc_data <- nc_open('L4_analyzed_sst_005res_25081981_31122018.nc')
print(nc_data) #it shows metadata

lon <- ncvar_get(nc_data, "lon")
lat <- ncvar_get(nc_data, "lat", verbose = F)
t <- ncvar_get(nc_data, "time")
t1 <- as.POSIXct(t, origin="1981-01-01", format="%Y-%m-%d", tz = "UTC")

sst.array <- ncvar_get(nc_data, "analysed_sst") # store the data in a 3-dimensional array (long, lat, time)
dim(sst.array)

fillvalue <- ncatt_get(nc_data, "analysed_sst", "_FillValue")
fillvalue
nc_close(nc_data)

sst.array[sst.array == fillvalue$value] <- NA #fill the no value with NA

str(sst.array)

##################################################################################################
#1. sst change for the entire period
#######################################################################################
slope_matrix <- matrix(nrow = length(lon), ncol = length(lat))
for (i in 1:dim(sst.array)[1]){
  for (j in 1:dim(sst.array)[2]){
    value <- sst.array[i,j,]
    if (anyNA(value)==FALSE){ #this "if" speed up the process because in my case the only NA are noValue
    extracted_data <- data.frame(value=value, t=t1)
    extracted_data$quarter <- "null"
    extracted_data$quarter <- ifelse(substr(extracted_data$t, 6,7)=="01"|substr(extracted_data$t, 6,7)=="02"|substr(extracted_data$t, 6,7)=="03", "Q1", extracted_data$quarter)
    extracted_data$quarter <- ifelse(substr(extracted_data$t, 6,7)=="04"|substr(extracted_data$t, 6,7)=="05"|substr(extracted_data$t, 6,7)=="06", "Q2", extracted_data$quarter)
    extracted_data$quarter <- ifelse(substr(extracted_data$t, 6,7)=="07"|substr(extracted_data$t, 6,7)=="08"|substr(extracted_data$t, 6,7)=="09", "Q3", extracted_data$quarter)
    extracted_data$quarter <- ifelse(substr(extracted_data$t, 6,7)=="10"|substr(extracted_data$t, 6,7)=="11"|substr(extracted_data$t, 6,7)=="12", "Q4", extracted_data$quarter)
    model <- lm(value ~ time(t) + quarter, data = extracted_data)
    slope_matrix[i,j] <- mean(predict(model) - extracted_data$value)
    }
  }
}

slope_raster <- raster(t(slope_matrix), xmn=min(lon), xmx=max(lon), ymn=min(lat), ymx=max(lat), crs=CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs+ towgs84=0,0,0"))
plot(slope_raster)
slope_raster <- flip(slope_raster, direction='y')
writeRaster(slope_raster, "sst_change_1981-2018.tif", "GTiff", overwrite=TRUE)

I used the nc file as an array and built a nested for loop in order to model the time series in each pixel. Then, I transformed the matrix of values into a raster, flip it to reorient it, and wrote it as a GeoTIFF for further uses. Now, when I open it in ArcGIS, it does not perfectly overlap with the raster created with the "make netCDF raster layer" tool. The raster created in R has slightly a smaller resolution (0.049 vs 0.050).

Do you know what's the problem here?

Thank you for the help!

EDIT: here you can download the data.


Solution

  • There are much easier ways to deal with ncdf files with spatial data. Below I open the file with the raster package (that has R based code for this) and with the terra package (that uses the GDAL library). They give the same resolution; so it safe to assume that this is correct.

    f <- "L4_analyzed_sst_005res_25081981_31122018.nc"
    library(raster)
    b <- brick(f)
    b
    #class      : RasterBrick 
    #dimensions : 34, 48, 1632, 13643  (nrow, ncol, ncell, nlayers)
    #resolution : 0.05004599, 0.05015772  (x, y)
    #extent     : 13.22879, 15.631, 39.52957, 41.23494  (xmin, xmax, ymin, ymax)
    #crs        : +proj=longlat +datum=WGS84 +no_defs 
    #source     : L4_analyzed_sst_005res_25081981_31122018.nc 
    #names      : X1981.08.25, X1981.08.26, X1981.08.27, X1981.08.28, X1981.08.29, X1981.08.30, X1981.08.31, X1981.09.01, X1981.09.02, X1981.09.03, X1981.09.04, X1981.09.05, X1981.09.06, X1981.09.07, X1981.09.08, ... 
    #Date/time  : 1981-08-25, 2018-12-31 (min, max)
    #varname    : analysed_sst 
    

    Or with terra

    x <- rast(f)
    x
    #class       : SpatRaster 
    #dimensions  : 34, 48, 13643  (nrow, ncol, nlyr)
    #resolution  : 0.05004599, 0.05015772  (x, y)
    #extent      : 13.22879, 15.631, 39.52957, 41.23494  (xmin, xmax, ymin, ymax)
    #coord. ref. : +proj=longlat +datum=WGS84 +no_defs 
    #data source : L4_analyzed_sst_005res_25081981_31122018.nc 
    #names       : L4__1, L4__2, L4__3, L4__4, L4__5, L4__6, ... 
     
    

    The resolution that you use is wrong because you do:

    raster(nrow=34, ncol=48, xmn=min(lon), xmx=max(lon), ymn=min(lat), ymx=max(lat))
    #class      : RasterLayer 
    #dimensions : 34, 48, 1632  (nrow, ncol, ncell)
    #resolution : 0.04900336, 0.04868249  (x, y)
    #extent     : 13.25381, 15.60598, 39.55465, 41.20986  (xmin, xmax, ymin, ymax)
    #crs        : +proj=longlat +datum=WGS84 +no_defs 
    

    You are using the lon/lat values of the centers of the corner cells. But what you need to supply are the coordinates of the outer edge of the corner cells.

    Unrelated to your question, but you could now go on and compute the quarters (outside any loop --- no need to do it again and again for each cell?)

    z <- getZ(b)
    month <- as.numeric(substr(z, 6 ,7 ))
    quarter <- ((month-1) %% 4 ) + 1
    # or quarter <- trunc((month-1) / 3) + 1
    table(quarter)
    #quarter
    #   1    2    3    4 
    #3434 3333 3435 3441 
    

    And see ?calc for examples of doing cell-based regression