For example In this given data set I would like to get the best values of each variable that will yield a pre-set value of "percentage" : for example I need that the value of "percentage" will be >=0.7 so in this case the outcome should be something like:
birds >=5,1<wolfs<=3 , 2<=snakes <=4
Example data set:
dat <- read.table(text = "birds wolfs snakes percentage
3 8 7 0.50
1 2 3 0.33
5 1 1 0.66
6 3 2 0.80
5 2 4 0.74",header = TRUE
I can't use decision trees as I have a large data frame and I can't see all tree correctly. I tried the *arules*
package as but it requires that all variables will be factors and I have mixed dataset of factor,logical and continuous variables and I would like to keep the variables and the Independent variable continues .Also I need "percentage" variable to be the only one that I would like to optimize.
The code that I wrote with *arules*
package is this:
library(arules)
dat$birds<-as.factor(dat$birds)
dat$wolfs<-as.factor(dat$wolfs)
dat$snakes<-as.factor(dat$snakes)
dat$percentage<-as.factor(dat$percentage)
rules<-apriori(dat, parameter = list(minlen=2, supp=0.005, conf=0.8))
Thank you
I may have misunderstood the question but to get the maximum value of each variable with the restriction of percentage >= 0.7
you could do this:
lapply(dat[dat$percentage >= 0.7, 1:3], max)
$birds
[1] 6
$wolfs
[1] 3
$snakes
[1] 4
Edit after comment:
So perhaps this is more what you are looking for:
> as.data.frame(lapply(dat[dat$percentage >= 0.7,1:3], function(y) c(min(y), max(y))))
birds wolfs snakes
1 5 2 2
2 6 3 4
It will give the min and max values representing the ranges of variables if percentage >=0.7
If this is completely missing what you are trying to achieve, I may not be the right person to help you.
Edit #2:
> as.data.frame(lapply(dat[dat$percentage >= 0.7,1:3], function(y) c(min(y), max(y), length(y), length(y)/nrow(dat))))
birds wolfs snakes
1 5.0 2.0 2.0
2 6.0 3.0 4.0
3 2.0 2.0 2.0
4 0.4 0.4 0.4
Row 1: min Row 2: max Row 3: number of observations meeting the condition Row 4: percentage of observations meeting the condition (relative to total observations)