I have around 30 dataframes with varying number of samples, but same metadata columns. For example, the columns are Sample ID
,Date of collection
,Place of collection
,Days since sample collection
to mention a few.
I want to summarize them based on "Place of collection" and "Days since sample collection". For this I'm using the below function -
check_summary_df <- function(x) {
summarized_data <- x %>% group_by(place_of_collection, day) %>% summarize(count = n())
summarized_data$df_name <- deparse(substitute(x)) # adding this as a column so I can track the df_name
return(summarized_data)
}
And it is providing me with a dataframe with the required summary. My df names are non-standard, so I have put them in a list using input_df_list <- c('df1','collected_by_x','collected_by_y')
and now I want to loop the function over the list. I tried a simple for loop -
for (i in 1:length(input_df_list)) { check_summary_df(input_df_list[i])}
And got the below error -
Error in UseMethod("group_by") :
no applicable method for 'group_by' applied to an object of class "character"
From what I am seeing, the input_df_list[i]
of the loop is recognizing the input as a character string, rather than recognizing it as a dataframe. How can I change this behaviour? Or is there any other way to loop over a list of data frame?
The idiomatic way to do this in R is to create a list of data frames, rather than a list of names, and then iterate over that. As you already have input_df_list
, a character vector of names, you can do this with get()
. Here's an example:
# Vector of names
input_df_list <- c("iris", "mtcars", "cars")
# Create a list of data frames
df_list <- lapply(input_df_list, \(nm) get(nm)) |>
setNames(input_df_list)
# Simple function we can apply to all data frames
check_summary_df <- function(dat) {
names(dat)
}
# Apply function to each data frame
lapply(df_list, check_summary_df)
# $iris
# [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
# $mtcars
# [1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear" "carb"
# $cars
# [1] "speed" "dist"
You could also add x <- get(x)
in the top line of your function but you'll find your R code will be more readable if you work with lists of data frames.