I have the following data frame.
Data_Frame <- data.frame(Factor_1 = rep(LETTERS[1:4], each = 12, length.out = 48), Factor_2 = rep(letters[1:3], each = 4, length.out = 48), Factor_3 = rep(1:2, each = 2, length.out = 48), Response = rnorm(48, 25, 1))
I want to create a nested list where I've split the data frame by each of the factors in the study in succession. I'll start with a vector containing the column names which contain the factors I want to split the data frame by (this vector will contain the factors in the order I want the resulting list to be nested in).
Factors_to_Split_by <- c("Factor_1", "Factor_2", "Factor_3")
The resulting list should look like the following Output
object.
Output <- lapply(lapply(split(Data_Frame, Data_Frame[, which(colnames(Data_Frame) == Factors_to_Split_by[1])]), function (x) {
split(x, x[, which(colnames(x) == Factors_to_Split_by[2])])
}), function (x) {
lapply(x, function (y) {
split(y, y[, which(colnames(y) == Factors_to_Split_by[3])])
})
})
How can I write a recursive function using Factors_to_Split_by
as the input and returning the desired Output
list as the output? I may have more than 3 factors to split the data by, and I'd like something modular and efficient and programmatic.
Thanks!
Here is one possible approach using Reduce
and a custom function:
split_df <- function(x, split) {
if (is.data.frame(x)) {
split(x, x[split])
} else {
lapply(x, split_df, split = split)
}
}
Output2 <- Reduce(split_df, Factors_to_Split_by, init = Data_Frame)
identical(Output, Output2)
#> [1] TRUE