Search code examples
rpredictmodelr

Getting predictions from one dataset for multiple models in R


I am trying to perform the following task: take a dataset, and feed it through multiple pre-designed models so that I can see how the predictions differ based on the different models. The following is what the data generally looks like:

data_1:

  Year day station1 station2 station3 hour minute avtemp
1 2020   1    1.124    1.018    0.852   00     30        0.998
2 2020   1    1.123    1.020    0.848   01      0        0.997
3 2020   1    1.119    1.013    0.842   01     30        0.991
4 2020   1    1.124    1.016    0.845   02      0        0.995
5 2020   1    1.124    1.016    0.842   02     30        0.994
6 2020   1    1.124    1.017    0.840   03      0        0.994

Then, models were generated using a separate dataset that is structured very similarly (except for the fact that they are divided by "stand", the experimental unit for which I need to have a model for each, hence the multiple models), using the following code:


models_temp <- data_2 %>% 
  group_by(stand) %>% 
  do(modeltt = lm(projectedtemp ~ avtemp, data = .)) %>% 
  ungroup()

As you can see, the independent variable in the model matches a column in data_1, so hypothetically it should read cleanly. This code then generates a dataset with two columns: one with stand, and one with the model for each stand, with a lot of data stored in a list() format for each model, as is shown here:

stand             model
trees             list(coefficients = c(`(Intercept)` = 0.66135577718....)
shrubs            list(coefficients = c(`(Intercept)` = 0.6468382809102...)

I tried to then use various versions of add_predictions, such as below, to use these models in a list to generate predictions from the dataset:


data_3 <- spread_predictions(data = data_1, models = models_temp)

Alas, I get the following error:


Error in UseMethod("predict") : 
  no applicable method for 'predict' applied to an object of class "list"


I searched StackOverflow and couldn't find specific examples of people trying to do this, at least not without having to dramatically restructure their models. Does anyone know of maybe a better function to make what I see as a relatively simple task work, a better way to structure my models/data, or a simple fix to this error I am getting? Thank you all so much in advance.

All libraries loaded are as follows, and I believe most of this stuff relies on the "modelr" package:

library(dplyr)
library(ggplot2)
library(tidyverse)
library(gridExtra)

Solution

  • We can loop over the list with map and apply the function

    library(purrr)
    map(models_temp$modeltt, ~ spread_predictions(data = data_1, models = .x))