Search code examples
rlapplynested-lists

r nested lapply with two opposite conditions


I have a dataset which is like this.

library(dplyr)
set.seed(505)
tempdf1 <- 
  data.frame(
    y = rnorm(400),
    x1 = rnorm(400),
    x2 = rnorm(400),
    x3 = sample(1:5, 40, replace = TRUE),
    out = rep(1:2, each = 200),
    imp = rep(1:4, each = 20)
  )

There are two different outcomes as indicated in the column out, {1,2}.

I want to fit the model separately for these two outcomes.

So I use lapply like this.

tempdf2 <- split(tempdf1, tempdf1$out)

lapply(tempdf2, function(df) {
  df %>%
    group_by(imp) %>%
    do(model = lm(y ~ x1 + x2, data = .)) %>%
    as.list() %>%
    .[[-1]]
})

I have column x3. This is a factor, with 5 values, {1,2,3,4,5}. Now I want to fit the same model above (2 different outcomes, 2 seperate models lapply(tempdf2, function(df) { df %>% group_by(imp) %>%....

When out = 1

    Case1) Exclude x3 value 5 and fit the model where x3 values {1,2,3,4}: out = 1

           lapply(tempdf2, 
            function(df) { 
             df %>%  
               group_by(imp) %>%....`  

     Case2) Exclude x3 value 4 and fit the model where x3 values {1,2,3,5}: out = 1

           lapply(tempdf2, 
            function(df) { 
             df %>%  
               group_by(imp) %>%....`  

     Case3) Exclude x3 value 3 and fit the model where x3 values {1,2,4,5}: out = 1

           lapply(tempdf2, 
            function(df) { 
             df %>%  
               group_by(imp) %>%....` 


     Case4) Exclude x3 value 2 and fit the model where x3 values {1,3,4,5}: out = 1

           lapply(tempdf2, 
            function(df) { 
             df %>%  
               group_by(imp) %>%....`  

     Case5) Exclude x3 value 1 and fit the model where x3 values {2,3,4,5}: out = 1

           lapply(tempdf2, 
            function(df) { 
             df %>%  
               group_by(imp) %>%....`  

Now repeat the above process where out=2

This is where I am struggling and I need help.

I dont know how to include a second inner lapply loop that will fit the above model only on subsets of x3 after excluding rows with one level of x3 , one at a time.

Also I like to know which model corresponds to which excluded rows of x3 and which out values {1,2}, so that I can keep track of the results. Thanks in advance for any help.


Solution

  • Instead of creating multiple loops and grouping, just create a list of every combination of Out, imp and X3 and then filter the original data frame and then model.

    #find the values for out, imp and x3
    out<-unique(tempdf1$out)
    imp<-unique(tempdf1$imp)
    x3<-sort(unique(tempdf1$x3))
    
    #create df with every combination
    models <- expand.grid(out, imp, x3)
    names(models) <- c("out", "imp", "x3")
    
    #loop through the combinations and name the list elements 
    output <- lapply(1:nrow(models), function(i) {
       tempdf <- tempdf1 %>% filter(x3 != (6-models$x3[i]), imp==models$imp[i], out==models$out[i])
       lm(y ~ x1 + x2, data = tempdf)
    } )
    names(output) <-  paste("Out=", models$out, "Imp=", models$imp, "Case:",models$x3)
    output
    

    Notice the name of each list element has the model's conditions.

    $`Out= 1 Imp= 1 Case: 1`
    
    Call:
    lm(formula = y ~ x1 + x2, data = tempdf)
    
    Coefficients:
    (Intercept)           x1           x2  
       -0.24430     -0.36825      0.01901  
    
    
    $`Out= 2 Imp= 1 Case: 1`
    
    Call:
    lm(formula = y ~ x1 + x2, data = tempdf)
    
    Coefficients:
    (Intercept)           x1           x2  
       0.290079    -0.271359    -0.008404