I have a dataset which is like this.
library(dplyr)
set.seed(505)
tempdf1 <-
data.frame(
y = rnorm(400),
x1 = rnorm(400),
x2 = rnorm(400),
x3 = sample(1:5, 40, replace = TRUE),
out = rep(1:2, each = 200),
imp = rep(1:4, each = 20)
)
There are two different outcomes as indicated in the column out
, {1,2}.
I want to fit the model separately for these two outcomes.
So I use lapply
like this.
tempdf2 <- split(tempdf1, tempdf1$out)
lapply(tempdf2, function(df) {
df %>%
group_by(imp) %>%
do(model = lm(y ~ x1 + x2, data = .)) %>%
as.list() %>%
.[[-1]]
})
I have column x3. This is a factor, with 5 values, {1,2,3,4,5}. Now I want to fit the same model above (2 different outcomes, 2 seperate models lapply(tempdf2, function(df) { df %>% group_by(imp) %>%....
Case1) Exclude x3 value 5 and fit the model where x3 values {1,2,3,4}: out = 1
lapply(tempdf2,
function(df) {
df %>%
group_by(imp) %>%....`
Case2) Exclude x3 value 4 and fit the model where x3 values {1,2,3,5}: out = 1
lapply(tempdf2,
function(df) {
df %>%
group_by(imp) %>%....`
Case3) Exclude x3 value 3 and fit the model where x3 values {1,2,4,5}: out = 1
lapply(tempdf2,
function(df) {
df %>%
group_by(imp) %>%....`
Case4) Exclude x3 value 2 and fit the model where x3 values {1,3,4,5}: out = 1
lapply(tempdf2,
function(df) {
df %>%
group_by(imp) %>%....`
Case5) Exclude x3 value 1 and fit the model where x3 values {2,3,4,5}: out = 1
lapply(tempdf2,
function(df) {
df %>%
group_by(imp) %>%....`
This is where I am struggling and I need help.
I dont know how to include a second inner lapply
loop that will fit the above model only on subsets of x3 after excluding rows with one level of x3 , one at a time.
Also I like to know which model corresponds to which excluded rows of x3 and which out values {1,2}, so that I can keep track of the results. Thanks in advance for any help.
Instead of creating multiple loops and grouping, just create a list of every combination of Out, imp and X3 and then filter the original data frame and then model.
#find the values for out, imp and x3
out<-unique(tempdf1$out)
imp<-unique(tempdf1$imp)
x3<-sort(unique(tempdf1$x3))
#create df with every combination
models <- expand.grid(out, imp, x3)
names(models) <- c("out", "imp", "x3")
#loop through the combinations and name the list elements
output <- lapply(1:nrow(models), function(i) {
tempdf <- tempdf1 %>% filter(x3 != (6-models$x3[i]), imp==models$imp[i], out==models$out[i])
lm(y ~ x1 + x2, data = tempdf)
} )
names(output) <- paste("Out=", models$out, "Imp=", models$imp, "Case:",models$x3)
output
Notice the name of each list element has the model's conditions.
$`Out= 1 Imp= 1 Case: 1`
Call:
lm(formula = y ~ x1 + x2, data = tempdf)
Coefficients:
(Intercept) x1 x2
-0.24430 -0.36825 0.01901
$`Out= 2 Imp= 1 Case: 1`
Call:
lm(formula = y ~ x1 + x2, data = tempdf)
Coefficients:
(Intercept) x1 x2
0.290079 -0.271359 -0.008404