I'm trying to generate predictions for multiple models at the same time using dplyr using the script below. Unfortunately this is resulting in duplicated data that do not really make sense. All I want is the orginal data along with 2 model columns (1 for each model) and 2 columns with the predicted values. Thank you
library(modelr)
install.packages("gapminder")
library(gapminder)
data(gapminder)
d<-gapminder %>%
group_by(continent) %>%
nest() %>%
mutate(model = data %>% map(~lm(lifeExp ~ pop, data = .))) %>%
mutate(model = data %>% map(~lm(lifeExp ~ pop + gdpPercap , data = .))) %>%
mutate(Pred = map2(model, data, predict)) %>%
mutate(Pred1 = map2(model, data, predict)) %>%
unnest(Pred,Pred1 data) ```
We could use nest_by
and create the model columns in mutate
, then ungroup
to remove the rowwise
attributes created by nest_by
, loop over the 'model' and 'data' columns with pmap
, extract the columns as in the order of select
ion, i.e. ..1
-> data, ..2
-> model1 and ..3
-> model3. Create the new "Pred" columns in the 'data' (..1
), remove the model
columns in select
and unest
the 'data'
library(dplyr)
library(purrr)
library(tidyr)
gapminder %>%
nest_by(continent) %>%
mutate(model1 = list(lm(lifeExp ~ pop, data = data)),
model2 = list(lm(lifeExp ~ pop + gdpPercap, data = data ))) %>%
ungroup %>%
mutate(data = pmap(select(., data, model1, model2),
~ ..1 %>%
mutate(Pred1 = predict(..2, ..1), Pred2 = predict(..3, ..1)))) %>%
select(-model1, -model2) %>%
unnest(c(data))
# A tibble: 1,704 x 8
# continent country year lifeExp pop gdpPercap Pred1 Pred2
# <fct> <fct> <int> <dbl> <int> <dbl> <dbl> <dbl>
# 1 Africa Algeria 1952 43.1 9279525 2449. 48.8 49.2
# 2 Africa Algeria 1957 45.7 10270856 3014. 48.9 50.0
# 3 Africa Algeria 1962 48.3 11000948 2551. 48.9 49.4
# 4 Africa Algeria 1967 51.4 12760499 3247. 49.1 50.5
# 5 Africa Algeria 1972 54.5 14760787 4183. 49.2 52.0
# 6 Africa Algeria 1977 58.0 17152804 4910. 49.4 53.2
# 7 Africa Algeria 1982 61.4 20033753 5745. 49.6 54.6
# 8 Africa Algeria 1987 65.8 23254956 5681. 49.8 54.7
# 9 Africa Algeria 1992 67.7 26298373 5023. 50.0 54.0
#10 Africa Algeria 1997 69.2 29072015 4797. 50.2 53.9
# … with 1,694 more rows
Or without using the pmap
, we can create new columns with across
and mutate
, then unnest
gapminder %>%
nest_by(continent) %>%
mutate(model1 = list(lm(lifeExp ~ pop, data = data)),
model2 = list(lm(lifeExp ~ pop + gdpPercap, data = data )),
across(starts_with('model'), ~ list(Predict = predict(., data)),
.names = "{.col}_Predict")) %>%
select(-model1, -model2) %>%
ungroup %>%
unnest(c(data, model1_Predict, model2_Predict))
-output
# A tibble: 1,704 x 8
# continent country year lifeExp pop gdpPercap model1_Predict model2_Predict
# <fct> <fct> <int> <dbl> <int> <dbl> <dbl> <dbl>
# 1 Africa Algeria 1952 43.1 9279525 2449. 48.8 49.2
# 2 Africa Algeria 1957 45.7 10270856 3014. 48.9 50.0
# 3 Africa Algeria 1962 48.3 11000948 2551. 48.9 49.4
# 4 Africa Algeria 1967 51.4 12760499 3247. 49.1 50.5
# 5 Africa Algeria 1972 54.5 14760787 4183. 49.2 52.0
# 6 Africa Algeria 1977 58.0 17152804 4910. 49.4 53.2
# 7 Africa Algeria 1982 61.4 20033753 5745. 49.6 54.6
# 8 Africa Algeria 1987 65.8 23254956 5681. 49.8 54.7
# 9 Africa Algeria 1992 67.7 26298373 5023. 50.0 54.0
#10 Africa Algeria 1997 69.2 29072015 4797. 50.2 53.9
# … with 1,694 more rows