Search code examples
rspatialr-sf

compute mean of selected columns in a list of sf objects and store the values in a data frame


Let's say I have a list of sf objects, and I would like to compute the mean of selected columns. Afterwards, I would like to store these values in separate columns in a new data frame. The sample data can downloaded from here. Below is what I have done so far. How can this be fixed?

 # Now make a list of two sample shapefiles "a" and "b"
myfiles = list.files(path = "~",
                     pattern=".shp$", full.names = TRUE)

# Read each shapefile and return a list of sf objects
listOfShp = lapply(myfiles, st_read)
 
# First make an empty df
time.series = data.frame()
# Start a loop
for (i in listOfShp){
  time.series$Mean.Z = data.frame(mean(i$z)) 
  time.series$Intensity.mean = data.frame(mean(i$V4))
}


Error in `$<-.data.frame`(`*tmp*`, "Mean.Z", value = list(mean.i.z. = -4.19655105979791)) : 
  replacement has 1 row, data has 0

Solution

  • It looks like you are trying to assign a data frame to be an entry instead of a value. What you probably want is something like this:

    time.series <-
      listOfShp %>%
      purrr::map_df(
        function(df_) {
          data.frame(
            Mean.Z = mean(df_$z),
            Intensity.mean = mean(df_$V4)
          )
        }
      )
    

    This solution iterates over the listOfShp. For each shapefile dataframe in the list, it applies the function which creates a dataframe with two columns. After it as created a dataframe for each element in the list, it binds them together into a single dataframe.

    An even more elegant solution that carries along the file names might be:

    
    # Function that takes as an input a file name and outputs some facts about the
    # dataframe:
    describe_shapefile <- function(shp_path) {
      sf_df <- st_read(shp_path) %>%
        st_set_geometry(NULL)
      mean_z = mean(sf_df$z)
      int_mean = mean(sf_df$V4)
      data.frame(
        filename = shp_path,
        Mean.Z = mean_z,
        Intensity.mean = int_mean
      )
    }
    
    # Apply the function to each file in the list
    myfiles %>%
      purrr::map_df(describe_shapefile)