Search code examples
rgeospatialterrar-stars

stack geotiff with stars 'along' when 'band' dimension contains band + time information


I have a timeseries of geotiff files I'd like to stack in R using stars. Here's the first two:

urls <- paste0("/vsicurl/",
"https://sdsc.osn.xsede.org/bio230014-bucket01/neon4cast-drivers/",
"noaa/gefs-v12/cogs/gefs.20221201/",
c("gep01.t00z.pgrb2a.0p50.f003.tif", "gep01.t00z.pgrb2a.0p50.f006.tif"))

library(stars)
stars::read_stars(urls, along="time")

Errors with:

Error in c.stars_proxy(`3` = list(gep01.t00z.pgrb2a.0p50.f003.tif = "/vsicurl/https://sdsc.osn.xsede.org/bio230014-bucket01/neon4cast-drivers/noaa/gefs-v12/cogs/gefs.20221201/gep01.t00z.pgrb2a.0p50.f003.tif"),  : 
  don't know how to merge arrays: please specify parameter along

Context: bands contain both time+band info

This fails because the dimensions do not match, which happens because the files have concatenated temporal information into the band names:

x<- lapply(urls, read_stars)
x

produces:

[[1]]
stars object with 3 dimensions and 1 attribute
attribute(s), summary of first 1e+05 cells:
                                       Min.  1st Qu. Median     Mean  3rd Qu.     Max.
gep01.t00z.pgrb2a.0p50.f003.ti...  50026.01 98094.81 101138 98347.42 101845.2 104605.2
dimension(s):
     from  to  offset delta                       refsys point
x       1 720 -180.25   0.5 Coordinate System importe... FALSE
y       1 361   90.25  -0.5 Coordinate System importe... FALSE
band    1   8      NA    NA                           NA    NA
                                                           values x/y
x                                                            NULL [x]
y                                                            NULL [y]
band PRES:surface:3 hour fcst,...,DLWRF:surface:0-3 hour ave fcst    

[[2]]
stars object with 3 dimensions and 1 attribute
attribute(s), summary of first 1e+05 cells:
                                       Min.  1st Qu.   Median     Mean 3rd Qu.     Max.
gep01.t00z.pgrb2a.0p50.f006.ti...  50029.83 98101.83 101170.6 98337.52  101825 104588.2
dimension(s):
     from  to  offset delta                       refsys point
x       1 720 -180.25   0.5 Coordinate System importe... FALSE
y       1 361   90.25  -0.5 Coordinate System importe... FALSE
band    1   8      NA    NA                           NA    NA
                                                           values x/y
x                                                            NULL [x]
y                                                            NULL [y]
band PRES:surface:6 hour fcst,...,DLWRF:surface:0-6 hour ave fcst    

Note the band names would align except for the existence of the timestamp being tacked on, e.g. PRES:surface:3 hour fcst vs PRES:surface:6 hour fcst.

How can I best read in these files so that I have dimensions of x,y,band, and time in my stars object?

alternatives: terra?

How about terra? Note that terra is happy to read these files in directly, but treats this as 16 unique bands. Can I re-align that so that I have the original 8 bands along a new "time" dimension? (I recognize stars emphasizes 'spatio-temporal', maybe the such a cube is out of scope to terra?) Also note that terra for some reason mangles the timestamp in these band names:

x <- terra::rast(urls)
x
class       : SpatRaster 
dimensions  : 361, 720, 16  (nrow, ncol, nlyr)
resolution  : 0.5, 0.5  (x, y)
extent      : -180.25, 179.75, -90.25, 90.25  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat Coordinate System imported from GRIB file 
sources     : gep01.t00z.pgrb2a.0p50.f003.tif  (8 layers) 
              gep01.t00z.pgrb2a.0p50.f006.tif  (8 layers) 
names       : PRES:~ fcst, TMP:2~ fcst, RH:2 ~ fcst, UGRD:~ fcst, VGRD:~ fcst, APCP:~ fcst, .

Solution

  • With terra it is pretty easy to make a time-series for each variable as I show below.

    urls <- paste0("/vsicurl/",
    "https://sdsc.osn.xsede.org/bio230014-bucket01/neon4cast-drivers/",
    "noaa/gefs-v12/cogs/gefs.20221201/",
    c("gep01.t00z.pgrb2a.0p50.f003.tif", "gep01.t00z.pgrb2a.0p50.f006.tif"))
    
    library(terra)
    r <- rast(urls)
    

    Extract two variables of interest

    nms <- names(r)
    tmp <- r[[grep("TMP", nms)]]
    rh <- r[[grep("RH", nms)]]
    
    # set time
    tm <- as.POSIXct("2022-12-01", tz="GMT") + c(3,6) * 3600
    time(rh) <- tm 
    time(tmp) <- tm
    

    And you could combine them into a SpatRasterDatset like this:

    s <- sds(list(tmp=tmp, rh=rh))
    

    An alternative path to get to the same point would be to start with a SpatRasterDataset and subset it.

    sd <- sds(urls)
    nl <- 1:length(sd)
    nms <- names(sd[1])
    
    tmp2 <- rast(sd[nl, grep("TMP", nms)])
    time(tmp2) <- tm
    
    rh2 <- rast(sd[nl, grep("RH", nms)])
    time(rh2) <- tm
    

    I made the subsetting work a little nicer in terra version 1.7-5

    urls <- paste0("/vsicurl/",
    "https://sdsc.osn.xsede.org/bio230014-bucket01/neon4cast-drivers/",
    "noaa/gefs-v12/cogs/gefs.20221201/", c("gep01.t00z.pgrb2a.0p50.f003.tif", "gep01.t00z.pgrb2a.0p50.f006.tif"))
    
    library(terra)
    #terra 1.7.5
    sd <- sds(urls)
    tmp <- sd[,2]
    
    tmp
    #class       : SpatRaster 
    #dimensions  : 361, 720, 2  (nrow, ncol, nlyr)
    #resolution  : 0.5, 0.5  (x, y)
    #extent      : -180.25, 179.75, -90.25, 90.25  (xmin, xmax, ymin, ymax)
    #coord. ref. : lon/lat Coordinate System imported from GRIB file 
    #sources     : gep01.t00z.pgrb2a.0p50.f003.tif  
    #              gep01.t00z.pgrb2a.0p50.f006.tif  
    #names       : TMP:2 m above g~Temperature [C], TMP:2 m above g~Temperature [C] 
    #unit        :                               C,                               C 
    #time        : 2022-12-01 03:00:00 to 2022-12-01 06:00:00 UTC 
    

    As for the layer names containing the forecast time, that is just because that is what is in the tif metadata. It looks like that was a decision made when they were created from the original GRIB files.

    The latitude extent going beyond the north and south poles is an interesting feature of this dataset.