Search code examples
rlogistic-regression

Contrasts error when running logistic regression


I'm trying to do a logistic regression with a subset of the data. This is my code :

     reg1 <- glm(smoke_binary~ Age + Marital.Status + Highest.Qualification, 
     data = subset(uf_train,(uf_train$Marital.Status=="Married" &
                             uf_train$Marital.Status=="Separated" & 
                             uf_train$Marital.Status=="Widowed" & 
                             uf_train$Marital.Status== "Divorced" & 
                             uf_train$Highest.Qualification=="GCSE/CSE" &
                             uf_train$Highest.Qualification=="O Level" &
                             uf_train$Highest.Qualification=="A Levels")),
     family=binomial)

But i keep on getting this error. I don't know what it means or how I can fix it:

Error in contrasts<-(*tmp*, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels


Solution

  • Your subset selection is off. By using & ("AND") for mutually exclusive levels you're actually ending up with an empty data set (like saying "select all the M&Ms that are green AND brown"). When debugging, it helps to do the subset selection separately, so that you can check the results ...

    glm_data <- subset(uf_train,
             Marital.Status %in% c("Widowed", "Married", "Separated", "Divorced") &
             Highest.Qualification %in% c("GCSE/CSE", "O Level", "A Levels"))
    nrow(glm_data)
    table(glm_data$Marital.Status)
    table(glm_data$Highest.Qualification)