Search code examples
roptimizationarules

Is there a package that I can use in order to get rules for a target outcome in R


For example In this given data set I would like to get the best values of each variable that will yield a pre-set value of "percentage" : for example I need that the value of "percentage" will be >=0.7 so in this case the outcome should be something like:

birds >=5,1<wolfs<=3 , 2<=snakes <=4

Example data set:

dat <- read.table(text = "birds    wolfs     snakes  percentage
3         8          7         0.50
1         2          3         0.33
5         1          1         0.66
6         3          2         0.80
5         2          4         0.74",header = TRUE

I can't use decision trees as I have a large data frame and I can't see all tree correctly. I tried the *arules* package as but it requires that all variables will be factors and I have mixed dataset of factor,logical and continuous variables and I would like to keep the variables and the Independent variable continues .Also I need "percentage" variable to be the only one that I would like to optimize. The code that I wrote with *arules* package is this:

library(arules)
dat$birds<-as.factor(dat$birds)
dat$wolfs<-as.factor(dat$wolfs)
dat$snakes<-as.factor(dat$snakes)
dat$percentage<-as.factor(dat$percentage)
rules<-apriori(dat, parameter = list(minlen=2, supp=0.005, conf=0.8))

Thank you


Solution

  • I may have misunderstood the question but to get the maximum value of each variable with the restriction of percentage >= 0.7 you could do this:

    lapply(dat[dat$percentage >= 0.7, 1:3], max)
    
    $birds
    [1] 6
    
    $wolfs
    [1] 3
    
    $snakes
    [1] 4
    

    Edit after comment:

    So perhaps this is more what you are looking for:

    > as.data.frame(lapply(dat[dat$percentage >= 0.7,1:3], function(y) c(min(y), max(y))))
      birds wolfs snakes
    1     5     2      2
    2     6     3      4
    

    It will give the min and max values representing the ranges of variables if percentage >=0.7

    If this is completely missing what you are trying to achieve, I may not be the right person to help you.

    Edit #2:

    > as.data.frame(lapply(dat[dat$percentage >= 0.7,1:3], function(y) c(min(y), max(y), length(y), length(y)/nrow(dat))))
      birds wolfs snakes
    1   5.0   2.0    2.0
    2   6.0   3.0    4.0
    3   2.0   2.0    2.0
    4   0.4   0.4    0.4
    

    Row 1: min Row 2: max Row 3: number of observations meeting the condition Row 4: percentage of observations meeting the condition (relative to total observations)