Search code examples
rnagenetics

What is an alternative to leap() that can handle NAs?


Need to apply Branch and bound method to choose best model. leaps() from leaps package works well, only if the data has no NA values, otherwise throws an error:

#dummy data
x<-matrix(rnorm(100),ncol=4)
#convert to 0,1,2 - this is a genetic data, NA=NoCall
x<-matrix(round(runif(100)*10) %% 3,ncol=4)
#introduce NA=NoCall
x[1,1] <-NA
#response, case or control
y<-rep(c(0,1,1,0,1),5)
leaps(x,y)

Error in leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = NCOL(x) + int,  : 
  NA/NaN/Inf in foreign function call (arg 4)

Using only complete.cases() is not an option as I lose 80% of data.

What is an alternative to leap that can handle NAs? I am writing my own function to do something similar, but it is getting big and clunky, I feel like I am reinventing the wheel...

UPDATE: I have tried using stepAIC(), facing the same problem of missing data:

Error in stepAIC(fit) : 
  number of rows in use has changed: remove missing values?

Solution

  • This is a stats problem, as AIC can't compare models built with different data sets. So to compare models with and without certain variables, you need to remove the rows with missing values for those variables. You may need to "reconsider your modeling strategy", to quote Ben Bolker. Otherwise you may also want to look into variants of AIC, a quick Google search brings up a recent JASA article that might be a good starting point.

    - Aaron