Search code examples
rregressionmass

Attempting to run a negative binomial regression using the MASS package in R


I am trying to run a negative binomial regression on the following:

df <- structure(list(Year = c("2018", "2018", "2018", "2018", "2018", 
"2018", "2018", "2018", "2018", "2018", "2018", "2018", "2019", 
"2019", "2019", "2019", "2019", "2019"), Month = c("1", "10", 
"11", "12", "2", "3", "4", "5", "6", "7", "8", "9", "1", "2", 
"3", "4", "5", "6"), count = c(109L, 91L, 73L, 74L, 94L, 113L, 
92L, 100L, 114L, 111L, 106L, 86L, 116L, 92L, 94L, 84L, 78L, 98L
), year_mon = c("2018 - 1", "2018 - 10", "2018 - 11", "2018 - 12", 
"2018 - 2", "2018 - 3", "2018 - 4", "2018 - 5", "2018 - 6", "2018 - 7", 
"2018 - 8", "2018 - 9", "2019 - 1", "2019 - 2", "2019 - 3", "2019 - 4", 
"2019 - 5", "2019 - 6")), row.names = c(NA, -18L), groups = structure(list(
    Year = c("2018", "2019"), .rows = structure(list(1:12, 13:18), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = 1:2, class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))

I'm assuming this is the best regression technique for this other than Poisson Regression but I run the following....

library(MASS)
summary(glm.nb(count ~ year_mon, data=df))

..and get this error...

Error in while ((it <- it + 1) < limit && abs(del) > eps) { : 
        missing value where TRUE/FALSE neededError in while ((it <- it + 1) < limit && abs(del) > eps) { : 
                missing value where TRUE/FALSE needed

Unsure what exactly I am doing wrong here. I'm not exactly attached to Negative Binom for this but I want another model to compare to than just Poisson, and this looks like a good fit.


Solution

  • As @rawr says, you need to convert the predictor variable to some kind of numeric value: otherwise you have one point per level of the categorical predictor. This works, for example:

    glm.nb(count~as.numeric(factor(year_mon)), data=df)
    

    ... although it's probably better/more readable to modify the variable inside your data frame first (or create a new variable inside the data frame) rather than doing the conversion on the fly