Search code examples
rsmote

How to solve this error when applying SMOTE in R?


I am trying to apply smote to my dataset with the following code

dataset$target<- as.factor(dataset$target)
dataset <- SMOTE(target~ ., dataset, perc.over = 100, perc.under=200)
dataset$target <- as.numeric(dataset$target)

But I got the below error.

Warning message in smote.exs(data[minExs, ], ncol(data), perc.over, k):
“NAs introduced by coercion”
Warning message in smote.exs(data[minExs, ], ncol(data), perc.over, k):
“NAs introduced by coercion”
Warning message in smote.exs(data[minExs, ], ncol(data), perc.over, k):
“NAs introduced by coercion”
Error in factor(newCases[, a], levels = 1:nlevels(data[, a]), labels = levels(data[, : invalid 'labels'; length 0 should be 1 or 2
Traceback:

1. SMOTE(target ~ ., dataset, perc.over = 100, perc.under = 200)
2. smote.exs(data[minExs, ], ncol(data), perc.over, k)
3. factor(newCases[, a], levels = 1:nlevels(data[, a]), labels = levels(data[, 
 .     a]))
4. stop(gettextf("invalid 'labels'; length %d should be 1 or %d", 
 .     nlab, length(levels)), domain = NA)

The target column contains 0 and 1

str(dataset$target)

And it return the following output

 Factor w/ 2 levels "0","1": 1 1 2 2 1 1 1 1 1 1 ...

May I know what's the problem here? I can't understand the error message.


Solution

  • I think this depends on the character columns in your data frame. SMOTE doesn't know how to generate new character observations based on your dataset. A possible solution is to drop the character column.

    library(data.table)
    library(DMwR)
    dataset <- fread("D:/archive/df.csv")
    set.seed(4)
    #sampling 10000 rows just for computational reasons
    dataset <- dataset[sample(1:nrow(dataset),10000),]
    dataset <- as.data.frame(dataset)
    dataset$isFraud<- factor(dataset$isFraud)
    table(dataset$isFraud)
    str(dataset)
    #drop the character column
    dataset <- dataset[,!sapply(dataset, is.character)]
    new.dataset <- SMOTE(isFraud ~ ., dataset, perc.over = 100, perc.under=200)
    table(new.dataset$isFraud)