I'm trying to validate the performance of a generalized linear model, that has a continuous output. Through research I found that the most effective means of validating the performance of a continuous model is to utilise rsquared, adjusted rsquared and RMSE methods(correct me if I'm wrong) rather than utilise the confusion matrix method (accuracy, precision, f1 etc.) used for binomial models.
How do I find the squared value for my model, based on the actual vs. predicted value. Below is the code for my glm model, data has been split into train and test.
Quite new to this so open to suggestions.
#GENERALISED LINEAR MODEL
LR_swim <- glm(racetime_mins ~ event_month +gender + place +
clocktime_mins +handicap_mins +
Wind_Speed_knots+
Air_Temp_Celsius +Water_Temp_Celsius +Wave_Height_m,
data = SwimmingTrain,
family=gaussian(link = "identity"))
summary(LR_swim)
#Predict Race_Time
pred_LR <- predict(LR_swim, SwimmingTest, type ="response")
pred_LR
Such performance measures can be implemented with a simple line of R code. So, for some dummy data:
preds <- c(1.0, 2.0, 9.5)
actuals <- c(0.9, 2.1, 10.0)
the mean squared error (MSE) is simply
mean((preds-actuals)^2)
# [1] 0.09
while the mean absolute error (MAE), is
mean(abs(preds-actuals))
# [1] 0.2333333
and the root mean squared error (RMSE) is simply the square root of the MSE, i.e.:
sqrt(mean((preds-actuals)^2))
# [1] 0.3
The last two measures have an additional advantage of being in the same scale as your original data (not the case for MSE).