Search code examples

Multiple linear model prediction in dplyr

I'm trying to generate predictions for multiple models at the same time using dplyr using the script below. Unfortunately this is resulting in duplicated data that do not really make sense. All I want is the orginal data along with 2 model columns (1 for each model) and 2 columns with the predicted values. Thank you


d<-gapminder %>% 
  group_by(continent) %>%
  nest() %>% 
  mutate(model = data %>% map(~lm(lifeExp ~ pop, data = .))) %>% 
  mutate(model = data %>% map(~lm(lifeExp ~ pop + gdpPercap , data = .))) %>% 
  mutate(Pred = map2(model, data, predict)) %>% 
  mutate(Pred1 = map2(model, data, predict)) %>% 
  unnest(Pred,Pred1 data) ```


  • We could use nest_by and create the model columns in mutate, then ungroup to remove the rowwise attributes created by nest_by, loop over the 'model' and 'data' columns with pmap, extract the columns as in the order of selection, i.e. ..1 -> data, ..2 -> model1 and ..3-> model3. Create the new "Pred" columns in the 'data' (..1), remove the model columns in select and unest the 'data'

    gapminder %>%
         nest_by(continent)  %>% 
         mutate(model1 = list(lm(lifeExp ~ pop, data = data)),
                model2 = list(lm(lifeExp ~ pop + gdpPercap, data = data ))) %>% 
         ungroup %>% 
         mutate(data = pmap(select(., data, model1, model2),  
              ~ ..1 %>%
                  mutate(Pred1 = predict(..2, ..1), Pred2 = predict(..3, ..1)))) %>%
        select(-model1, -model2) %>%
    # A tibble: 1,704 x 8
    #   continent country  year lifeExp      pop gdpPercap Pred1 Pred2
    #   <fct>     <fct>   <int>   <dbl>    <int>     <dbl> <dbl> <dbl>
    # 1 Africa    Algeria  1952    43.1  9279525     2449.  48.8  49.2
    # 2 Africa    Algeria  1957    45.7 10270856     3014.  48.9  50.0
    # 3 Africa    Algeria  1962    48.3 11000948     2551.  48.9  49.4
    # 4 Africa    Algeria  1967    51.4 12760499     3247.  49.1  50.5
    # 5 Africa    Algeria  1972    54.5 14760787     4183.  49.2  52.0
    # 6 Africa    Algeria  1977    58.0 17152804     4910.  49.4  53.2
    # 7 Africa    Algeria  1982    61.4 20033753     5745.  49.6  54.6
    # 8 Africa    Algeria  1987    65.8 23254956     5681.  49.8  54.7
    # 9 Africa    Algeria  1992    67.7 26298373     5023.  50.0  54.0
    #10 Africa    Algeria  1997    69.2 29072015     4797.  50.2  53.9
    # … with 1,694 more rows

    Or without using the pmap, we can create new columns with across and mutate, then unnest

    gapminder %>%
         nest_by(continent) %>% 
         mutate(model1 = list(lm(lifeExp ~ pop, data = data)),
                model2 = list(lm(lifeExp ~ pop + gdpPercap, data = data )),
                across(starts_with('model'),  ~ list(Predict = predict(., data)),
                 .names = "{.col}_Predict")) %>% 
         select(-model1, -model2)  %>%
         ungroup %>% 
         unnest(c(data, model1_Predict, model2_Predict))


    # A tibble: 1,704 x 8
    #   continent country  year lifeExp      pop gdpPercap model1_Predict model2_Predict
    #   <fct>     <fct>   <int>   <dbl>    <int>     <dbl>          <dbl>          <dbl>
    # 1 Africa    Algeria  1952    43.1  9279525     2449.           48.8           49.2
    # 2 Africa    Algeria  1957    45.7 10270856     3014.           48.9           50.0
    # 3 Africa    Algeria  1962    48.3 11000948     2551.           48.9           49.4
    # 4 Africa    Algeria  1967    51.4 12760499     3247.           49.1           50.5
    # 5 Africa    Algeria  1972    54.5 14760787     4183.           49.2           52.0
    # 6 Africa    Algeria  1977    58.0 17152804     4910.           49.4           53.2
    # 7 Africa    Algeria  1982    61.4 20033753     5745.           49.6           54.6
    # 8 Africa    Algeria  1987    65.8 23254956     5681.           49.8           54.7
    # 9 Africa    Algeria  1992    67.7 26298373     5023.           50.0           54.0
    #10 Africa    Algeria  1997    69.2 29072015     4797.           50.2           53.9
    # … with 1,694 more rows