I have run a caret prediction model
fit <- train(outcome~ ., data = training,
method = 'glmnet',
metric = "ROC",
tuneLength = 5,
trControl = fitControl)
fit
Now I want to apply that model to out of sample (external) validation set - however I do not have access to this data, I am sending the final models to a collaborator for them to apply to their data
I originally saved out the final model by:
combined_coef<-as.matrix(exp(coef(fit$finalModel, fit$bestTune$lambda)))
So it could be read in and applied it to the new data
fitValidation <- predict(fit, newdata = validation, type = "prob")
It wouldn't work on a data frame, or a matrix, and when read in as a list, the error msg was:
"Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "c('tbl_df', 'tbl', 'data.frame')"
So does it have to be the whole model fit object? Is there an easier way to do that than save out and send the whole (massive) fit object? Is there a way of only saving out the 'final model' (as above) and then applying this in the 'predict' call?
Thanks
As Sirius says, the best way to do this would be to just save the model object. It shouldn't be that large.
However, in a pinch, the other option would be for your collaborator to score the model by hand. One can do this by multiplying the validation matrix against the vector of coefficients. The code would look like the below, given that you have a matrix validation
in the same format as your model matrix and coefficients
as a vector. This calculation is for logistic regression, and given you are using ROC as your fit metric, this should be what you need.
# do the scoring via matrix multiplication
scores <- t(t(validation) * coefficients)
# sum the scores by row and exponentiate.
log_odds <- exp(rowSums(scores, na.rm = TRUE))
final_scores <- log_odds / (1 + log_odds)