I have been training some models and when I try to use Support Vector Machines with Radial Basis Function Kernel I get the following error:
> svmRFit <- train(x = Fraud_trainX,
+ y = Fraud_trainY,
+ method = "svmRadial",
+ metric = "ROC",
+ preProc = c("center", "scale"),
+ tuneLength = 15,
+ trControl = ctrl)
Error in if (any(co)) { : valor ausente donde TRUE/FALSE es necesario
Además: Warning messages:
1: In FUN(newX[, i], ...) : NAs introducidos por coerción
2: In FUN(newX[, i], ...) : NAs introducidos por coerción
3: In FUN(newX[, i], ...) : NAs introducidos por coerción
4: In FUN(newX[, i], ...) : NAs introducidos por coerción
5: In FUN(newX[, i], ...) : NAs introducidos por coerción
Called from: .local(x, ...)
Browse[1]>
Here is a summary of my database:
summary(Fraud_trainX)
Make AccidentArea PolicyType VehicleCategory
Pontiac :1412 Rural: 597 SedC :2109 Sedan :3660
Toyota :1177 Urban:5186 SedL :1857 Sport :1994
Honda :1054 SedA :1551 Utility: 129
Mazda : 883 SpoC : 126
Chevrolet: 637 Utility - All Perils: 113
Accura : 183 UtiCL : 16
(Other) : 437 (Other) : 11
BasePolicy WeekOfMonthClaimed Age PolicyNumber RepNumber
AP:1675 Min. :1.000 Min. :16.00 Min. : 2 Min. : 1.000
C :2246 1st Qu.:2.000 1st Qu.:31.00 1st Qu.: 3866 1st Qu.: 4.000
L :1862 Median :3.000 Median :38.00 Median : 7757 Median : 9.000
Mean :2.703 Mean :40.71 Mean : 7754 Mean : 8.473
3rd Qu.:4.000 3rd Qu.:49.00 3rd Qu.:11556 3rd Qu.:12.000
Max. :5.000 Max. :80.00 Max. :15420 Max. :16.000
NA's :130
Deductible DriverRating ClaimSize Month
Min. :400.0 Min. :1.000 Min. : 0 Min. : 1.000
1st Qu.:400.0 1st Qu.:1.000 1st Qu.: 4112 1st Qu.: 3.000
Median :400.0 Median :3.000 Median : 8150 Median : 6.000
Mean :407.3 Mean :2.488 Mean : 22921 Mean : 6.384
3rd Qu.:400.0 3rd Qu.:3.000 3rd Qu.: 43446 3rd Qu.: 9.000
Max. :700.0 Max. :4.000 Max. :141394 Max. :12.000
NA's :4
WeekOfMonth DayOfWeek DayOfWeekClaimed MonthClaimed
Min. :1.000 Min. :1.000 Min. :1.000 Min. : 1.000
1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.: 3.000
Median :3.000 Median :4.000 Median :3.000 Median : 6.000
Mean :2.776 Mean :3.844 Mean :2.824 Mean : 6.345
3rd Qu.:4.000 3rd Qu.:5.000 3rd Qu.:4.000 3rd Qu.: 9.000
Max. :5.000 Max. :7.000 Max. :7.000 Max. :12.000
Sex MaritalStatus Fault VehiclePrice
Min. :0.0000 Min. :1.000 Min. :0.0000 Min. :1.000
1st Qu.:1.0000 1st Qu.:1.000 1st Qu.:0.0000 1st Qu.:2.000
Median :1.0000 Median :2.000 Median :0.0000 Median :2.000
Mean :0.8406 Mean :1.698 Mean :0.2722 Mean :2.783
3rd Qu.:1.0000 3rd Qu.:2.000 3rd Qu.:1.0000 3rd Qu.:3.000
Max. :1.0000 Max. :3.000 Max. :1.0000 Max. :6.000
Days_Policy_Accident Days_Policy_Claim PastNumberOfClaims AgeOfVehicle
Min. :0.000 Min. :1.000 Min. :0.000 Min. :0.000
1st Qu.:4.000 1st Qu.:3.000 1st Qu.:0.000 1st Qu.:6.000
Median :4.000 Median :3.000 Median :1.000 Median :7.000
Mean :3.971 Mean :2.993 Mean :1.333 Mean :6.592
3rd Qu.:4.000 3rd Qu.:3.000 3rd Qu.:2.000 3rd Qu.:8.000
Max. :4.000 Max. :3.000 Max. :3.000 Max. :8.000
AgeOfPolicyHolder PoliceReportFiled WitnessPresent AgentType
Min. :1.00 Min. :0.00000 Min. :0.00000 Min. :0.00000
1st Qu.:5.00 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
Median :6.00 Median :0.00000 Median :0.00000 Median :0.00000
Mean :5.89 Mean :0.02957 Mean :0.00536 Mean :0.01504
3rd Qu.:7.00 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
Max. :9.00 Max. :1.00000 Max. :1.00000 Max. :1.00000
NumberOfSuppliments AddressChange_Claim NumberOfCars
Min. :0.000 Min. :0.0000 Min. :0.0000
1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.0000
Median :1.000 Median :0.0000 Median :0.0000
Mean :1.163 Mean :0.1757 Mean :0.1027
3rd Qu.:2.000 3rd Qu.:0.0000 3rd Qu.:0.0000
Max. :3.000 Max. :3.0000 Max. :3.0000
The structure of the database:
str(Fraud_trainX)
'data.frame': 5783 obs. of 32 variables:
$ Make : Factor w/ 19 levels "Accura","BMW",..: 7 18 6 7 6 6 6 3 10 7 ...
$ AccidentArea : Factor w/ 2 levels "Rural","Urban": 2 1 2 1 2 2 2 2 2 2 ...
$ PolicyType : Factor w/ 8 levels "SedA","SedC",..: 5 3 3 2 3 3 1 2 3 2 ...
$ VehicleCategory : Factor w/ 3 levels "Sedan","Sport",..: 2 2 2 1 2 2 1 1 2 1 ...
$ BasePolicy : Factor w/ 3 levels "AP","C","L": 2 3 3 2 3 3 1 2 3 2 ...
$ WeekOfMonthClaimed : num 4 1 3 1 1 5 1 1 1 4 ...
$ Age : num 34 65 28 NA 61 38 41 28 40 21 ...
$ PolicyNumber : num 2 4 13 14 15 16 17 18 21 27 ...
$ RepNumber : num 15 4 11 12 3 16 15 6 3 1 ...
$ Deductible : num 400 400 400 400 400 400 400 400 400 400 ...
$ DriverRating : num 4 2 1 3 1 1 4 1 1 2 ...
$ ClaimSize : num 59294 7584 59748 82212 59552 ...
$ Month : int 1 6 1 1 1 8 4 7 4 3 ...
$ WeekOfMonth : int 3 2 3 5 5 4 4 5 2 3 ...
$ DayOfWeek : int 3 6 5 5 1 2 4 7 5 4 ...
$ DayOfWeekClaimed : int 1 5 5 3 4 1 3 3 2 4 ...
$ MonthClaimed : int 1 7 1 2 2 8 5 8 5 6 ...
$ Sex : int 1 1 1 1 1 1 1 0 1 1 ...
$ MaritalStatus : int 1 2 2 1 2 1 2 2 2 2 ...
$ Fault : int 0 1 0 1 0 0 0 1 0 0 ...
$ VehiclePrice : int 6 2 6 6 6 6 6 2 2 3 ...
$ Days_Policy_Accident: int 4 4 4 4 4 4 4 4 4 4 ...
$ Days_Policy_Claim : int 3 3 3 3 3 3 3 3 3 3 ...
$ PastNumberOfClaims : int 0 1 1 0 0 0 0 0 1 3 ...
$ AgeOfVehicle : int 6 8 7 0 8 6 7 7 8 5 ...
$ AgeOfPolicyHolder : int 5 8 5 1 8 6 6 5 6 4 ...
$ PoliceReportFiled : int 1 1 0 0 0 0 0 0 0 0 ...
$ WitnessPresent : int 0 0 0 0 0 0 0 0 0 0 ...
$ AgentType : int 0 0 0 0 0 0 0 0 0 0 ...
$ NumberOfSuppliments : int 0 3 0 0 0 0 0 1 3 3 ...
$ AddressChange_Claim : int 0 0 0 0 0 0 0 0 0 0 ...
$ NumberOfCars : int 0 0 0 0 0 0 0 0 0 0 ...
La variable respuesta:
summary(Fraud_trainY)
No Yes
5440 343
And here a little about the index and control that I use for model training:
indx <- createMultiFolds(Fraud_trainY, k = 5, times = 2)
str(indx)
ctrl <- trainControl(method = "repeatedcv",index = indx,
summaryFunction = twoClassSummary,
sampling = "up",
classProbs = TRUE)
And here the model parameters:
svmRFit <- train(x = Fraud_trainX,
y = Fraud_trainY,
method = "svmRadial",
metric = "ROC",
preProc = c("center", "scale"),
tuneLength = 15,
trControl = ctrl)
I have already tried to load the pROC library and it has not given me any favorable results, I have already eliminated the rows that contained NA from all the variables, the response variable already has the levels "No" and "Yes". I have also done this training for C5.0 ("C5.0"), Neural Networks (nnet) and Logistic Regression ("multinom") and in all of them the data have served me and it gives me the result of the model, this is the only model that marks me some kind of error.
As @AlvaroMartinez commented, the error was that I had variables as factor
, when I changed those variables to integer
the model worked correctly.