Search code examples
rdplyrtidyrglmnetpurrr

R - Using glmnet with purrr/tidyr


I am following code from this R Bloggers link in order to run models on groups within my data using tidyr and purrr. However, I would like to use glmnet rather than just lm on my nested data. Unlike lm, glmnet/cv.glmnet takes a model.matrix as the x argument and I need to abstract the formula fed to that model.matrix and that is what is holding me up.

So this works:

library(purrr)
library(tidyr)
library(dplyr)
library(glmnet)

mod_test <- mtcars %>%
  nest(-vs) %>%
  mutate(cv_mod = map(data, ~ cv.glmnet(
    x = model.matrix(data = ., .$mpg ~ .$cyl * .$hp)[,-1],
    y = .$mpg
  )))
mod_test
> mod_test
# A tibble: 2 x 3
     vs               data          cv_mod
  <dbl>             <list>          <list>
1     0 <tibble [18 x 10]> <S3: cv.glmnet>
2     1 <tibble [14 x 10]> <S3: cv.glmnet>

But when I try to create the formula for the model.matrix separately, it does not.

mod_form <- as.formula(".$mpg ~ .$cyl * .$hp")

mod_test2 <- mtcars %>%
  nest(-vs) %>%
  mutate(cv_mod = map(data, ~ cv.glmnet(
    x = model.matrix(data = ., mod_form)[,-1],
    y = .$mpg
  )))
Error in mutate_impl(.data, dots) : object '.' not found

Solution

  • First part, why Error in mutate_impl(.data, dots) : object '.' not found? The folowing is my reasoning:

    see manual of as.formula:

    Formulas created with as.formula will use the env argument for their environment.

    When you create mod_test: according to as.formula(object, env = parent.frame()), it will be <environment: R_GlobalEnv>.

    Next,

    A formula object has an associated environment, and this environment (rather than the parent environment) is used by model.frame to evaluate variables that are not found in the supplied data argument.

    So, model.matrix will look for columns like .$mpg in data. Apprently, those columns are called like mpg not .$mpg. Then it will looks for .$mpg in env associated with the formula: R_GlobalEnv. There is no object called . in global env. Therefore error was reported.

    (correct me if some of this part is wrong.)


    Second, solution, try:

    mod_form <- mpg ~ cyl * hp
    # or
    mod_form <- as.formula('mpg ~ cyl * hp')