Search code examples
rglmpoisson

Poisson GLM for non-integer counts - R


I was hoping to get some advice on a GLM with Poisson family.

I have a data set with the number of bites taken by each individual over a period of time. As the individuals observed were feeding for different periods of time, when I calculated the bite rate for each individual as bites/minute, I have ended up with non-integer numbers. Now, based on what I have read so far, I should still be able to do a GLM with a Poisson family. However, I am encountering errors and I think this may be because R is not liking the fact I'm using non-integers. Does anyone have any advice?

Example <- structure(list(Species = c("Fish1", "Fish2", "Fish3", "Fish4", 
"Fish5", "Fish6", "Fish7", "Fish1", "Fish2", "Fish3", "Fish4", 
"Fish5", "Fish6", "Fish7", "Fish1", "Fish2", "Fish3", "Fish4", 
"Fish5", "Fish6", "Fish7"), Site = c(1, 1, 1, 1, 1, 1, 1, 2, 
2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3), Bite_Rate = c(3.5, 7.5, 
0, 0, 2.45, 5.5, 6.5, 6.5, 7.5, 8.03, 32.1, 15.6, 18.2, 19.1, 
20.5, 20.5, 3.5, 5.7, 6.7, 23.2, 0)), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -21L), spec = structure(list(
   cols = list(Species = structure(list(), class = c("collector_character", 
   "collector")), Site = structure(list(), class = c("collector_double", 
   "collector")), Bite_Rate = structure(list(), class = c("collector_double", 
   "collector"))), default = structure(list(), class = c("collector_guess", 
   "collector")), skip = 1), class = "col_spec"))

str(Example) # check structure 
Example$Species<-as.factor(Example$Species) # set species as a factor 
str(Example) # check structure 
glm<-glm(Species~Bite_Rate, data=Example, family = poisson) # create the GLM

The error message when I run the GLM is:

Error in if (any(y < 0)) stop("negative values not allowed for the 'Poisson' family") : 
  missing value where TRUE/FALSE needed
In addition: Warning message:
In Ops.factor(y, 0) : ‘<’ not meaningful for factors 

I don't actually have any negative values which is what throws me off a little bit. Any advice would be much appreciated!

EDIT: Based on the comments, I have updated my example data so that it has counts of bites and the time of observation in seconds


Example <- structure(list(Species = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 
7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L), .Label = c("Fish1", 
"Fish2", "Fish3", "Fish4", "Fish5", "Fish6", "Fish7"), class = "factor"), 
    Site = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 
    3, 3, 3, 3, 3), Bites = c(0, 10, 18, 17, 6, 0, 1, 0, 19, 
    12, 7, 3, 5, 1, 5, 0, 10, 18, 17, 7, 25), Observed_Seconds = c(50, 
    33, 47, 20, 17, 10, 14, 21, 48, 10, 50, 33, 47, 20, 17, 10, 
    14, 21, 48, 10, 90)), row.names = c(NA, -21L), spec = structure(list(
    cols = list(Species = structure(list(), class = c("collector_character", 
    "collector")), Site = structure(list(), class = c("collector_double", 
    "collector")), Bites = structure(list(), class = c("collector_double", 
    "collector")), Observed_Seconds = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1), class = "col_spec"), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"))

Thank you!


Solution

  • You need to get the seconds (denominator) with which you normalized, and the actual bites (counts).

    Next include the minutes as an offset, and note, your response variable is on the left hand side of ~ :

    fit = glm(Bites ~ Species,offset=log(Observed_Seconds),
    family=poisson,data=Example)
    

    You can look at the summary:

    summary(fit)
    
       Coefficients:
                 Estimate Std. Error z value Pr(>|z|)    
    SpeciesFish1  -2.8679     0.4472  -6.413 1.42e-10 ***
    SpeciesFish2  -1.1436     0.1857  -6.158 7.35e-10 ***
    SpeciesFish3  -0.5738     0.1581  -3.629 0.000284 ***
    SpeciesFish4  -0.7732     0.1543  -5.011 5.42e-07 ***
    SpeciesFish5  -1.3269     0.1961  -6.766 1.33e-11 ***
    SpeciesFish6  -1.7198     0.2887  -5.958 2.56e-09 ***
    SpeciesFish7  -1.5244     0.1925  -7.921 2.35e-15 ***
    

    Seems very significant, but ne good thing to check is whether the data is overdispersed, and also include other factors (e.g Site):

    fit_quasi = glm(Bites ~ Species + factor(Site),offset=log(Observed_Seconds),
              family=quasipoisson,data=Example)
    summary(fit_quasi)
    
    Coefficients:
                  Estimate Std. Error t value Pr(>|t|)  
    (Intercept)    -2.9754     1.0713  -2.777   0.0167 *
    SpeciesFish2    1.8487     1.1434   1.617   0.1319  
    SpeciesFish3    2.2731     1.1152   2.038   0.0642 .
    SpeciesFish4    2.1246     1.1205   1.896   0.0823 .
    SpeciesFish5    1.3533     1.1604   1.166   0.2662  
    SpeciesFish6    1.2754     1.2658   1.008   0.3336  
    SpeciesFish7    0.8922     1.1719   0.761   0.4612  
    factor(Site)2  -0.2325     0.5132  -0.453   0.6587  
    factor(Site)3   0.6118     0.4677   1.308   0.2154  
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    (Dispersion parameter for quasipoisson family taken to be 5.521045)
    

    If it follows a poisson, the dispersion will be around 1, but in this case, it is overdispersed.