I would be happy if someone could help me understand glm with binominal error distribution.
Lets assume the following df:
year<-c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3,3, 3, 3, 3, 3, 3, 3, 3)
success<-c(1, 0, 3, 1, 1, 2, 6, 0, 1, 1, 12, 2, NA, 6, 12, 0, 10,
7, 4, 10, 13, 1, 2, 1, 18, 6, 3, 8, 3, 1, 9, 15, 6, 12,
6, 15, 13, 6, 8, 6, 2, 11, 6, 1, 12, 0, 4, 15, 0, 3, 18,
5, 6, 17, 5, 3, 17, 8, 0, 7, 12, 10, 26, 12, 4, 17, 1, 8,
2, 7, 14, 8)
no_success<-c(1, 9, 5, 4, 6, 1, 4, 4, 6, 10, 16, 4, NA, 3, NA, 3,
5, 5, 6, 10, 0, 5, 3, 10, 1, 7, 11, 8, 20, 4, 3, 3,
19, 1, 11, 4, 6, 4, 9, 4, 10, 4, 2, 8, 3, 1, 13, 3,
5, 7, 5, 9, 3, 6, 3, 4, 3, 13, 6, 5, 10, 3, 1, 0,
18, 6, 13, 0, 3, 2, 2, 2)
df<-data.frame(year,success,no_success)
df$success<-as.integer(df$success)
df$no_success<-as.integer(df$no_success)
If I want to know if there is a linear increase or decrease between year in regards to the success or no_success of a thought up treatment I apply a binominal glm:
m<- glm(cbind(success, no_success)~year,
data=df, family = "quasibinomial",
na.action=na.exclude)
summary(m)
I changed to "quasibinomial" here because of overdispersion.
From the summary I see that there is a significant effect: P: 0.0219 *
As the coefficients in a binomial glm represent log odds, I get exp(estimate) = exp(0.3099) = 1.363
So, there is an increase in Odds of succes of 1.363 per year
My Questions are:
1.) When I exp(negative estimate) it gets always positive - this can not be correct. There must be a way to express negative relationships.
2.) When I want to visualize multiple linear models, I like to display the estimates. In a "normal" lm I would display the estimate and confidence interval like this: divide the estimate by the mean of the observation and than substract and add the mean of observation/Std. Error times 1.96.
Estimate.mean<-exp(0.3099)/mean(df$or,na.rm=TRUE)
Std.Error.mean<-exp(0.1321)/mean(df$or,na.rm=TRUE)
low<-Estimate.mean-Std.Error.mean*1.96
high<-Estimate.mean+Std.Error.mean*1.96
If this confidence level is not touching the zero line it should be significant. The effect is significantly not greater than zero.
But here the low bound is -0.3901804 and the high bound is 1.608095. This does not appear to be a significant linear relationship despite the low p-value from the glm (0.0219).
What have I mixed up here?
I am happy for any suggestions
The "zero line" in this case is x=1 and not x=0.
Question 2: the question is. Is there a effect that is different from zero? But odds of 1 basicaly means zero.
Question 1: When the estimate is exp the result can not be negative.But odds below 1 express a negative effect.
Here are some sources to calculate the confidence intervall for anyone stumbling over this post.
https://fromthebottomoftheheap.net/2018/12/10/confidence-intervals-for-glms/