Search code examples
rvalidationexternalr-caretglmnet

Save out a caret prediction model and apply to external data in R


I have run a caret prediction model

fit <- train(outcome~ ., data = training, 
                    method = 'glmnet', 
                    metric = "ROC",
                    tuneLength = 5,
                    trControl = fitControl)

fit

Now I want to apply that model to out of sample (external) validation set - however I do not have access to this data, I am sending the final models to a collaborator for them to apply to their data

I originally saved out the final model by:

combined_coef<-as.matrix(exp(coef(fit$finalModel, fit$bestTune$lambda)))

So it could be read in and applied it to the new data

fitValidation <- predict(fit, newdata = validation, type = "prob")

It wouldn't work on a data frame, or a matrix, and when read in as a list, the error msg was:

"Error in UseMethod("predict") : 
  no applicable method for 'predict' applied to an object of class "c('tbl_df', 'tbl', 'data.frame')"

So does it have to be the whole model fit object? Is there an easier way to do that than save out and send the whole (massive) fit object? Is there a way of only saving out the 'final model' (as above) and then applying this in the 'predict' call?

Thanks


Solution

  • As Sirius says, the best way to do this would be to just save the model object. It shouldn't be that large.

    However, in a pinch, the other option would be for your collaborator to score the model by hand. One can do this by multiplying the validation matrix against the vector of coefficients. The code would look like the below, given that you have a matrix validation in the same format as your model matrix and coefficients as a vector. This calculation is for logistic regression, and given you are using ROC as your fit metric, this should be what you need.

    # do the scoring via matrix multiplication
    scores <- t(t(validation) * coefficients)
    
    # sum the scores by row and exponentiate. 
    log_odds <- exp(rowSums(scores, na.rm = TRUE))
    final_scores <- log_odds / (1 + log_odds)