Search code examples
rpurrrtidymodels

Using purrr to fit many models using tidymodels


Still getting used to stackoverflow so apologies if this isn't posted correctly.

Recently, I've found myself having to run many models with slightly different predictors to gauge model performance (I'm sure there's a more elegant way of doing this) and I was thinking about creating a function or using map to do some of the heavy lifting.

Here are two reprex to show my dilemma

This works as expected:

library(tidymodels)
workflow() %>% 
  add_model(linear_reg()) %>% 
  add_formula(mpg ~ hp) %>% 
  fit(mtcars)

However, creating a vector of various other predictors I'd like to use and attempting to map through this doesn't work (produces the error: The following predictors were not found in data: '.x'.)

library(tidymodels)

preds <- c("disp", "hp", "wt")

map(preds, ~workflow() %>% add_model(linear_reg()) %>% add_formula(mpg ~ .x) %>% fit(mtcars))

I'm suspecting this is probably due to tidy evaluation by i'm struggling to find a solution to what I expect is a fairly common problem?


Solution

  • We could use paste or reformulate to construct the formula

    library(tidymodels)
    library(purrr)
    modlst <- map(preds,
        ~workflow() %>% 
         add_model(linear_reg()) %>% 
        add_formula(reformulate(.x, response = 'mpg')) %>% 
        fit(mtcars))
    

    -output

    > modlst
    [[1]]
    ══ Workflow [trained] ════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
    Preprocessor: Formula
    Model: linear_reg()
    
    ── Preprocessor ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    mpg ~ disp
    
    ── Model ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    
    Call:
    stats::lm(formula = ..y ~ ., data = data)
    
    Coefficients:
    (Intercept)         disp  
       29.59985     -0.04122  
    
    
    [[2]]
    ══ Workflow [trained] ════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
    Preprocessor: Formula
    Model: linear_reg()
    
    ── Preprocessor ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    mpg ~ hp
    
    ── Model ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    
    Call:
    stats::lm(formula = ..y ~ ., data = data)
    
    Coefficients:
    (Intercept)           hp  
       30.09886     -0.06823  
    
    
    [[3]]
    ══ Workflow [trained] ════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
    Preprocessor: Formula
    Model: linear_reg()
    
    ── Preprocessor ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    mpg ~ wt
    
    ── Model ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    
    Call:
    stats::lm(formula = ..y ~ ., data = data)
    
    Coefficients:
    (Intercept)           wt  
         37.285       -5.344