Search code examples
rpurrrlm

How to use map from (purrr) package in an efficient and easy way?


I'm trying to use map from the purrr package in a more efficient way that I'm doing right now. I have 3 different datasets, let's say iris_1, iris_2, an iris_3.
I want to run the same linear regression for all 3 datasets. My final goal is to get all the coefficients from each of these 3 regressions using map.

My code looks like this:

library(purrr)
library(dplyr)
library(tidyr)

# Load data
iris <- iris

#-------------------------------------------------------------------------------------------------------------#
#Basic modifications
#-------------------------------------------------------------------------------------------------------------#
iris_1   <- iris %>% dplyr::filter(Species=="versicolor") 
iris_2   <- iris %>% dplyr::filter(Species=="virginica") 
iris_3   <- iris %>% dplyr::filter(Species=="setosa") 

Databases <- list(iris_1,iris_2,iris_3)

####Step A
Linear_Models <- map(Databases, ~ lm(Sepal.Length ~ Sepal.Width + Petal.Length , data = .x))
M_1     <- Linear_Models[[1]]
M_2     <- Linear_Models[[2]]
M_3     <- Linear_Models[[3]]

####Step B
Linear_Models_Coeff <- list(M_1,M_2,M_3)
Coeff <- map(Linear_Models_Coeff, ~ coef(summary(.x)))
C_M_1     <- Coeff[[1]]
C_M_2     <- Coeff[[2]]
C_M_3     <- Coeff[[3]]

I tried to do these previous steps in a more efficient way (this is, putting together steps A and B) by doing the following. However when I try to get the coefficients, I don't get the desired results that I get in the previous steps (i.e. C_M_1 <- Coeff[[1]]).

Linear_Models <- map(Databases, ~ lm(Sepal.Length ~ Sepal.Width + Petal.Length , data = .x),~ coef(summary(.x)))
C_M_1     <- Linear_Models[[1]]

Many thanks in advance!! I know that there are multiple ways of doing this with other packages differents from purrr. But I really appreciate a help that includes the purrr package.


Solution

  • You could do this in one go (piping all the functions insinde map), e.g.

    purrr::map(Databases, ~ lm(Sepal.Length ~ Sepal.Width + Petal.Length , 
                                         data = .x) %>% summary() %>% coef()) %>% 
      set_names(c("M1", "M2", "M3"))
    

    Result:

    $M1
                  Estimate Std. Error  t value     Pr(>|t|)
    (Intercept)  2.1164314  0.4942556 4.282059 9.063960e-05
    Sepal.Width  0.2476422  0.1868389 1.325431 1.914351e-01
    Petal.Length 0.7355868  0.1247678 5.895648 3.870715e-07
    
    $M2
                  Estimate Std. Error   t value     Pr(>|t|)
    (Intercept)  0.6247824 0.52486745  1.190362 2.398819e-01
    Sepal.Width  0.2599540 0.15333757  1.695305 9.663372e-02
    Petal.Length 0.9348189 0.08960197 10.433017 8.009442e-14
    
    $M3
                  Estimate Std. Error  t value     Pr(>|t|)
    (Intercept)  2.3037382 0.38529423 5.979166 2.894273e-07
    Sepal.Width  0.6674162 0.09035581 7.386533 2.125173e-09
    Petal.Length 0.2834193 0.19722377 1.437044 1.573296e-0