I am using a logistic regression model to predict values in a raster dataset. Data used in the model are in the following format:
class b1 b2 b3 b4
A 121 111 90 160
A 100 90 67 90
B 90 120 102 154
...
I would expect the output of the model to be categorical (A or B; there are only two classes). Instead, the glm
model yields continuous values ranging from 0 - 1. Either my interpretation of the model output is incorrect, or am I coding this wrong. How should I interpret these results?
# GLM
myglm = glm(factor(class) ~ b1 + b2 + b3 + b4), data = df, family = binomial(link = "logit"))
# Predict results and write to image
predict(sf, myglm, outpath, type="response",
index=1, na.rm=TRUE, progress="text", overwrite=TRUE)
The output is correct. You should interpret these values as probabilities. The Base class set's what the probability is for.
The value 0.7 means a 70% probability of the data point belonging to class A(or B) depending on how you set the levels.
If you want binary classes out you have to decide on a cut-off in probability. If the prevalence is 50% the 0.5 should suffice as a cut-off.