Search code examples
rglmlm

How do i Interpret the coefficients of glm with binomial error distribution?


I would be happy if someone could help me understand glm with binominal error distribution.

Lets assume the following df:

year<-c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
        3, 3, 3, 3, 3, 3, 3, 3, 3, 3,3, 3, 3, 3, 3, 3, 3, 3)


success<-c(1,  0,  3,  1,  1,  2,  6,  0,  1,  1, 12,  2, NA,  6, 12,  0, 10,
           7,  4, 10, 13,  1,  2,  1, 18,  6,  3,  8,  3,  1,  9, 15,  6, 12,
           6, 15, 13,  6,  8,  6,  2, 11,  6, 1, 12,  0,  4, 15,  0,  3, 18,
           5,  6, 17,  5,  3, 17,  8,  0,  7, 12, 10, 26, 12,  4, 17,  1,  8,
           2,  7, 14,  8)

no_success<-c(1,  9,  5,  4,  6,  1,  4,  4,  6, 10, 16,  4, NA,  3, NA,  3,
              5,  5,  6, 10,  0,  5,  3, 10,  1,  7, 11,  8, 20,  4,  3,  3,
              19,  1, 11,  4,  6,  4,  9,  4, 10,  4,  2, 8,  3,  1, 13,  3,
              5,  7,  5,  9,  3,  6,  3,  4,  3, 13,  6,  5, 10,  3,  1,  0,
              18,  6, 13,  0,  3,  2,  2,  2)


df<-data.frame(year,success,no_success)

df$success<-as.integer(df$success)
df$no_success<-as.integer(df$no_success)

If I want to know if there is a linear increase or decrease between year in regards to the success or no_success of a thought up treatment I apply a binominal glm:

m<- glm(cbind(success, no_success)~year,
        data=df, family = "quasibinomial",
        na.action=na.exclude)
summary(m)

I changed to "quasibinomial" here because of overdispersion.

From the summary I see that there is a significant effect: P: 0.0219 *

As the coefficients in a binomial glm represent log odds, I get exp(estimate) = exp(0.3099) = 1.363

So, there is an increase in Odds of succes of 1.363 per year

My Questions are:

1.) When I exp(negative estimate) it gets always positive - this can not be correct. There must be a way to express negative relationships.

2.) When I want to visualize multiple linear models, I like to display the estimates. In a "normal" lm I would display the estimate and confidence interval like this: divide the estimate by the mean of the observation and than substract and add the mean of observation/Std. Error times 1.96.

  Estimate.mean<-exp(0.3099)/mean(df$or,na.rm=TRUE)
  
  Std.Error.mean<-exp(0.1321)/mean(df$or,na.rm=TRUE)
  
  
  low<-Estimate.mean-Std.Error.mean*1.96
  high<-Estimate.mean+Std.Error.mean*1.96

If this confidence level is not touching the zero line it should be significant. The effect is significantly not greater than zero.

But here the low bound is -0.3901804 and the high bound is 1.608095. This does not appear to be a significant linear relationship despite the low p-value from the glm (0.0219).

What have I mixed up here?

I am happy for any suggestions


Solution

  • The "zero line" in this case is x=1 and not x=0.

    Question 2: the question is. Is there a effect that is different from zero? But odds of 1 basicaly means zero.

    Question 1: When the estimate is exp the result can not be negative.But odds below 1 express a negative effect.

    Here are some sources to calculate the confidence intervall for anyone stumbling over this post.

    https://fromthebottomoftheheap.net/2018/12/10/confidence-intervals-for-glms/

    https://stats.stackexchange.com/questions/304833/how-to-calculate-odds-ratio-and-95-confidence-interval-for-logistic-regression