I have some personal dataset. So I split it into variable to predict and predictors. Following is the syntax:
library(Cubist)
str(A)
'data.frame': 6038 obs. of 3 variables:
$ ads_return_count : num 7 10 10 4 10 10 10 10 10 9 ...
$ actual_cpc : num 0.0678 0.3888 0.2947 0.0179 0.095 ...
$ is_user_agent_bot: Factor w/ 1 level "False": 1 1 1 1 1 1 1 1 1 1 ...
cubist(A[,c("ads_return_count","is_user_agent_bot")],A[,"actual_cpc"])
And I am getting the following error
cubist code called exit with value 1
Error in strsplit(tmp, "\"")[[1]] : subscript out of bounds
Is there something I am missing ?
Simulate some data to make a reproducible example:
A=data.frame(ads_return_count=sample(100,10,TRUE), actual_cpc=runif(100), is_user_agent_bot=factor(rep("False",100)))
cubist(A[,c("ads_return_count","is_user_agent_bot")],A[,"actual_cpc"])
cubist code called exit with value 1
Error in strsplit(tmp, "\"")[[1]] : subscript out of bounds
Great, now we're on the same page.
What bothers me is that the second argument, the outcome, is all "False". I'm not sure a model with only one outcome is meaningful. Let's try something with two outcomes:
> A2=data.frame(ads_return_count=sample(100,10,TRUE), actual_cpc=runif(100), is_user_agent_bot=sample(c("True","False"),100,TRUE))
> cubist(A2[,c("ads_return_count","is_user_agent_bot")],A2[,"actual_cpc"])
Call:
cubist.default(x = A2[, c("ads_return_count", "is_user_agent_bot")], y =
A2[, "actual_cpc"])
Number of samples: 100
Number of predictors: 2
Number of committees: 1
Number of rules: 1
I would say this was an uninformative error message from cubist
caused by having a single outcome possibility.