Search code examples
rr-caretnaivebayes

caret::predict giving Error: $ operator is invalid for atomic vectors


This has been driving me crazy and I've been looking through similar posts all day but can't seem to solve my problem. I have a naive bayes model trained and stored as model. I'm attempting to predict with a newdata data frame but I keep getting the error Error: $ operator is invalid for atomic vectors. Here is what I am running: stats::predict(model, newdata = newdata) where newdata is the first row of another data frame: new data <- pbp[1, c("balls", "strikes", "outs_when_up", "stand", "pitcher", "p_throws", "inning")]

class(newdata) gives [1] "tbl_df" "tbl" "data.frame".


Solution

  • The issue is with the data used. it should match the levels used in the training. E.g. if we use one of the rows from trainingData to predict, it does work

    predict(model, head(model$trainingData, 1))
    #[1] Curveball
    #Levels: Changeup Curveball Fastball Sinker Slider
    

    By checking the str of both datasets, some of the factor columns in the training is character class

    str(model$trainingData)
    'data.frame':   1277525 obs. of  7 variables:
     $ pitcher     : Factor w/ 1390 levels "112526","115629",..: 277 277 277 277 277 277 277 277 277 277 ...
     $ stand       : Factor w/ 2 levels "L","R": 1 1 2 2 2 2 2 1 1 1 ...
     $ p_throws    : Factor w/ 2 levels "L","R": 2 2 2 2 2 2 2 2 2 2 ...
     $ balls       : num  0 1 0 1 2 2 2 0 0 0 ...
     $ strikes     : num  0 0 0 0 0 1 2 0 1 2 ...
     $ outs_when_up: num  1 1 1 1 1 1 1 2 2 2 ...
     $ .outcome    : Factor w/ 5 levels "Changeup","Curveball",..: 3 4 1 4 1 5 5 1 1 5 ...
    
    str(newdata)
    tibble [1 × 6] (S3: tbl_df/tbl/data.frame)
     $ balls       : int 3
     $ strikes     : int 2
     $ outs_when_up: int 1
     $ stand       : chr "R"
     $ pitcher     : int 605200
     $ p_throws    : chr "R"
    

    An option is to make levels same for factor class

    nm1 <- intersect(names(model$trainingData), names(newdata))
    nm2 <- names(which(sapply(model$trainingData[nm1], is.factor)))
    newdata[nm2] <- Map(function(x, y) factor(x, levels = levels(y)), newdata[nm2], model$trainingData[nm2])
    

    Now do the prediction

    predict(model, newdata)
    #[1] Sinker
    #Levels: Changeup Curveball Fastball Sinker Slider