Search code examples
rdataframeimbalanced-data

How to solve the wrong variable type error when handling imbalance dataset by ROSE in R?


I am learning R with the Fraud Transaction data. When I try to use ROSE to handle the imbalanced dataset, the only handle continuous and categorical variables error pops up.

Here's what I tried:

str(dataset)
'data.frame':   6362620 obs. of  13 variables:
 $ step            : int  1 1 1 1 1 1 1 1 1 1 ...
 $ type            : chr  "PAYMENT" "PAYMENT" "TRANSFER" "CASH_OUT" ...
 $ amount          : num  9840 1864 181 181 11668 ...
 $ nameOrig        : chr  "C1231006815" "C1666544295" "C1305486145" "C840083671" ...
 $ oldbalanceOrg   : num  170136 21249 181 181 41554 ...
 $ newbalanceOrig  : num  160296 19385 0 0 29886 ...
 $ nameDest        : chr  "M1979787155" "M2044282225" "C553264065" "C38997010" ...
 $ oldbalanceDest  : num  0 0 0 21182 0 ...
 $ newbalanceDest  : num  0 0 0 0 0 ...
 $ isFraud         : int  0 0 1 1 0 0 0 0 0 0 ...
 $ isFlaggedFraud  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ balancedOfOrigin: num  -9840 -1864 -181 -181 -11668 ...
 $ balancedOfDest  : num  0 0 0 21182 0 ...

datadata_ROSE <- ROSE(isFraud~., data = dataset, N = 500, seed = 111)$data

With Error:

Error in rose.sampl(n, N, p, ind.majo, majoY, ind.mino, minoY, y, classy, : The current implementation of ROSE handles only continuous and categorical variables.

Debugging:

# change the isFraud attribute into category 0/1
dataset$isFraud = as.factor(dataset$isFraud)
datadata_ROSE <- ROSE(isFraud~., data = dataset, N = 500, seed = 111)$data

At the end, the error still cannot be solved. How can I turn the dataset fit ROSE model?


Solution

  • As you can see in your str part, type, nameOrig, nameDest are still character not factor. It will work with change them to factors. But when I look at nameOrig and nameDest, it's not seems to be appropriate to included in ROSE.

    dummy2 <- head(dataset, 100)
    
    dummy2$isFraud = as.factor(dummy2$isFraud)
    
    #additional part.
    dummy2 <- dummy2 %>%
      mutate(type = factor(type),
             nameDest = factor(nameDest),
             nameOrig = factor(nameOrig))
    dummy3 <- ROSE(isFraud~., data = dummy2, N = 500, seed = 111)$data