Search code examples
stringrselectstackraster

Select rasters in stack based on layer partial name match


I have a stack of rasters (one per species) and then I have a data frame with lat/long columns along with a species name.

fls = list.files(pattern="median")
s <- stack(fls)
df<-c("x","y","species name")

I want to be able to just select one raster at a time to use with an extract function. I want the selection to be based on the partial match based on the species name column. I want to do this because the raster names might not match perfectly the names in the species list, there might be a lower/upper case mismatch or the raster layer name might be longer, for example "species_name_median", or there might also be "_" instead of a blank.

for(i:length(df.species name))
{
  result<-extract(s[[partial match to "species name[i]" ]],df.xy)
}

I hope this makes sense that I just want to use one raster at a time for the extraction. I am able to easily select a single raster using s[[i]] but there is no guarantee that every species in the list has its equivalent raster.


Solution

  • If your data of points to query on consists of a data.frame of x and y coordinates and the appropriate species name for the layer to query on you can use these two commands to do everything:

    #  Find the layer to match on using 'grepl' and 'which' converting all names to lowercase for consistency
    df$layer <- lapply( df$species , function(x) which( grepl( tolower(x) , tolower(names(s)) ) ) )
    
    
    # Extract each value from the appropriate layer in the stack
    df$Value <- sapply( seq_len(nrow(df)) , function(x) extract( s[[ df$layer[x] ]] , df[ x , 1:2 ] ) )
    

    How it works

    Starting from the first line:

    • First we define a new column vector df$layer which will be the index of the rasterLayer in the stack that we need to use for that row.
    • lapply iterates along all the elements in the column df$species and applies an anonymous function using each item in df$species as an input variable x in turn. lapply is a loop construct even though it doesn't look like one.
    • on the first iteration we take the first element of df$species which is now x and use it in grepl (means something like 'global regular pattern matching logical') to find which elements of the names of our stack s contain our species pattern. We use tolower() on both the pattern to match against (x) and the elements to match in (names(s)) to ensure we match even when the case doesn't match case, e.g. "Tiger" won't find "tiger".
    • grepl returns a logical vector of which elements it found matches of the pattern in, e.g. grepl( "abc" , c("xyz", "wxy" , "acb" , "zxabcty" ) ) returns F , F , T , T. We use which to get the index of those elements.
    • The idea is that we get one, and only one match of a layer in the stack to the species name for each row, so the only TRUE index will be the index of the layer in the stack we want.

    On the second line, sapply:

    • sapply is an iterator much like lapply but it returns a vector rather than a list of values. TBH you could use either in this use-case.
    • Now we iterate across a sequence of numbers from 1 to nrow(df).
    • We use the row number in another anonymous function as our input variable x
    • We want to extract the "x" and "y" coordinates (columns 1 and 2 respectively) for the current row (given by x) of the data.frame, using the layer that we got in our previous line.
    • We assign the result of doing all this to another column in our data.frame which contains the extracted value for that x/y coord for the appropriate layer

    I hope that helps!!

    And a worked example with some data:

    require( raster )
    #  Sample rasters - note the scale of values in each layer  
    # Tens
    r1 <- raster( matrix( sample(1:10,100,repl=TRUE) , ncol = 10 ) )    
    # Hundreds
    r2 <- raster( matrix( sample(1e2:1.1e2,100,repl=TRUE) , ncol = 10 ) )   
    # Thousands
    r3 <- raster( matrix( sample(1e3:1.1e3,100,repl=TRUE) , ncol = 10 ) )
    
    #  Stack the rasters
    s <- stack( r1,r2,r3 )
    #  Name the layers in the stack
    names(s) <- c("LIon_medIan" , "PANTHeR_MEAN_AVG" , "tiger.Mean.JULY_2012")
    
    
    #  Data of points to query on
    df <- data.frame( x = runif(10) , y = runif(10) , species = sample( c("lion" , "panther" , "Tiger" ) , 10 , repl = TRUE ) )
    
    #  Run the previous code
    df$layer <- lapply( df$species , function(x) which( grepl( tolower(x) , tolower(names(s)) ) ) )
    df$Value <- sapply( seq_len(nrow(df)) , function(x) extract( s[[ df$layer[x] ]] , df[ x , 1:2 ] ) )
    
    #  And the result (note the scale of Values is consistent with the scale of values in each rasterLayer in the stack)
    df
    #          x         y species layer Value
    #1  0.4827577 0.7517476    lion     1     1
    #2  0.8590993 0.9929104    lion     1     3
    #3  0.8987446 0.4465397   tiger     3  1084
    #4  0.5935572 0.6591223 panther     2   107
    #5  0.6382287 0.1579990 panther     2   103
    #6  0.7957626 0.7931233    lion     1     4
    #7  0.2836228 0.3689158   tiger     3  1076
    #8  0.5213569 0.7156062    lion     1     3
    #9  0.6828245 0.1352709 panther     2   103
    #10 0.7030304 0.8049597 panther     2   105