Search code examples
rrandomglmr-factor

How to treat a variable as random factor in GLM in R


I am doing statistical analysis for a dataset using GLM in R. Basically the predictor variables are: "Probe"(types of probes used in the experiment - Factor with 4 levels), "Extraction"(types of extraction used in the experiment - Factor with 2 levels), "Tank"(the tank number that the sample is collected from - integers from 1 to 9), and "Dilution"(the dilution of each sample - numbers: 3.125, 6.25, 12.5, 25, 50, 100). The response is the number of positive responses ("Positive") obtained from a number of repetition of the experiment ("Rep"). I want to assess the effects of all predictor variables (and their interactions) on the number of positive responses, so I tried to fit a GLM model like this:

y<-cbind(mydata$Positive,mydata$Rep - mydata$Positive)
model1<-glm(y~Probe*Extraction*Dilution*Tank, family=quasibinomial, data=mydata)

But I was later advised by my supervisor that the "Tank" predictor variable should not be treated as a level-based variable. i.e. it has values of 1 to 9, but it's just the tank label so the difference between 1 and, say, 7 is not important. Treating this variable as factor would only make a large model with bad results. So how to treat the "Tank" variable as a random factor and include it in the GLM?

Thanks


Solution

  • It is called a "mixed effect model". Check out the lme4 package.

    library(lme4)
    glmer(y~Probe + Extraction + Dilution + (1|Tank), family=binomial, data=mydata)
    

    Also, you should probably use + instead of * to add factors. * includes all interactions and levels of each factor, which would lead to a huge overfitting model. Unless you have a specific reason to believe that there is interaction, in which case you should code that interaction explicitly.