I am trying to do classification with randomForest, but I am repeatedly getting an error message for which there seems to be no apparent solution (randomForest has worked well for me doing regression in the past). I have pasted my code below. 'success' is a factor, all of the dependent variables are numbers. Any suggestions as to how to run this classification properly?
> rf_model<-randomForest(success~.,data=data.train,xtest=data.test[,2:9],ytest=data.test[,1],importance=TRUE,proximity=TRUE)
Error in randomForest.default(m, y, ...) :
NA/NaN/Inf in foreign function call (arg 1)
also, here is a sample of the dataset:
head(data)
success duration goal reward_count updates_count comments_count backers_count min_reward_level max_reward_level
True 20.00000 1500 10 14 2 68 1 1000
True 30.00000 3000 10 4 3 48 5 1000
True 24.40323 14000 23 6 10 540 5 1250
True 31.95833 30000 9 17 7 173 1 10000
True 28.13211 4000 10 23 97 2936 10 550
True 30.00000 6000 16 16 130 2043 25 500
Did you try regression on the same data? if not, then check out for "Inf" values in your data and try to remove it if any, after removing NAs and NaNs. You can find useful information regarding removing Inf from below,
R is there a way to find Inf/-Inf values?
Example,
Class V1 V2 V3 V4 V5 V6 V7 V8 V9
1 11 Inf 4 232 23 2 2 34 0.205567767
1 11 123 4 232 23 1 2 34 0.162357601
1 13 123 4 232 23 1 2 34 -0.002739357
1 13 123 4 232 23 1 2 34 0.186989878
2 67 14 4 232 67 1 2 34 0.109398677
2 67 14 4 232 67 2 2 34 0.18491187
2 67 14 4 232 34 2 2 34 0.098728256
2 44 769.03 4 21 34 2 2 34 0.204405869
2 44 34 4 11 34 1 2 34 0.218426408
# When Classification was performed, following error pops out.
rf_model<-randomForest(as.factor(Class)~.,data=data,importance=TRUE,proximity=TRUE)
Error in randomForest.default(m, y, ...) :
NA/NaN/Inf in foreign function call (arg 1)
# Regression was performed, following error pops out.
rf_model<-randomForest(Class~.,data=data,importance=TRUE,proximity=TRUE)
Error in randomForest.default(m, y, ...) :
NA/NaN/Inf in foreign function call (arg 1)
So, please check your data very carefully. In addition: Warning message: In randomForest.default(m, y, ...) : The response has five or fewer unique values. Are you sure you want to do regression?