Search code examples
rlinear-regressionpredict

How can I get 1000 predictions from a linear model in R?


I want to simulate the predicted value from a linear regression model 1000 times and see how many of those times each type of car has the highest predicted mpg base on the independent variables of the model. I use the test and training set because I want to evaluate the fit of the model outside the training data.

data(mtcars)
library(caret)
trainingIndex <- createDataPartition(mtcars$mpg, p = 0.8, list = FALSE)
trainingset <- mtcars[trainingIndex,]
testingset <- mtcars[-trainingIndex,]

I create a data partition to have a training set and a test set. Now I have a test set and a training set I create the linear model

fit <- lm(mpg~., data = trainingset)

Now I have the linear model I tried to create a bootstrap to make the prediction from a simulation. I use boot_predict but it gives me an error.

library(finalfit)
boot_predict(fit,testingset, type = "response", R = 1000, estimate_name = NULL, 
             confint_sep = "to", condense = TRUE, boot_compare = TRUE, compare_name = NULL,
             comparison = "difference", ref_symbol = "-", digits = c(2,3))

Error: Problem with mutate() input term. x invalid format '%.2f'; use format %s for character objects i Input term is (function (x, digits) .... Run rlang::last_error() to see where the error occurred. In addition: Warning message: In predict.lm(object, newdata, se.fit, scale = 1, type = if (type == : prediction from a rank-deficient fit may be misleading

I am not sure if this is the best way to obtaining the 1000 prediction from the bootstrap


Solution

  • The part about how to use the training and testing remains unclear and I suggest you can sort that out and put it as another question. It seems that there is more than one question packed in this.

    I can try and address this:

    see how many of those times each type of car has the highest predicted mpg base on the independent variables of the model.

    For 1 bootstrap, the basic code to fit goes like:

    set.seed(111)
    da = mtcars[sample(nrow(mtcars),replace=TRUE),]
    fit = lm(mpg ~ .,data=da)
    

    To get the ranking we can do:

    rank(predict(fit,mtcars))
    

    And we wrap this into a function and iterate this through many bootstraps:

    bootpred = function(data){
    
    da = da[sample(nrow(da),replace=TRUE),]
    fit = lm(mpg ~ .,data=da)
    rank(predict(fit,data))
    
    }
    
    predictions = replicate(1000,bootpred(mtcars))
    

    The results is a matrix, every column 1 bootstrap, every row, the rank of the prediction for the car:

    head(predictions[,1:5],10)
                      [,1] [,2] [,3] [,4] [,5]
    Mazda RX4           18   16   12   20   18
    Mazda RX4 Wag       14   12   11   16   17
    Datsun 710          24   24   27   26   23
    Hornet 4 Drive      22   19   20   23   21
    Hornet Sportabout   15   13   15   11   15
    Valiant             16   11   18   21   20
    Duster 360           7   29    5    6    9
    Merc 240D           23   23   23   25   25
    Merc 230            25   20   19   24   32
    Merc 280            20   18   22   17   16
    

    This will tell you how many times each car has the highest value:

    rowSums(predictions==1)
              Mazda RX4       Mazda RX4 Wag          Datsun 710      Hornet 4 Drive 
                     18                   0                   0                   0 
      Hornet Sportabout             Valiant          Duster 360           Merc 240D 
                      0                   0                   3                   1 
               Merc 230            Merc 280           Merc 280C          Merc 450SE 
                     12                   1                   0                   1 
             Merc 450SL         Merc 450SLC  Cadillac Fleetwood Lincoln Continental 
                      0                   0                  80                  72 
      Chrysler Imperial            Fiat 128         Honda Civic      Toyota Corolla 
                    174                   0                   3                   0 
          Toyota Corona    Dodge Challenger         AMC Javelin          Camaro Z28 
                     18                   3                   0                  10 
       Pontiac Firebird           Fiat X1-9       Porsche 914-2        Lotus Europa 
                      3                   0                   0                   0 
         Ford Pantera L        Ferrari Dino       Maserati Bora          Volvo 142E 
                      0                  10                 591                   0