Search code examples
raprioriarules

How to subset rules by a lhs itemMatrix object in arules R?


I generated rules focusing on three distinct values on rhs side, something like this (in this case, three distinct values for fallotrayec):

rules <- apriori(df, parameter=list(minlen=3,maxlen=6,supp=0.015,conf=0.6,maxtime=120),
                appearance = list(rhs=c("fallotrayec_f1", "fallotrayec_f2", "fallotrayec_f3")))

Now I can extract 3 subsets of rules from that, containing only the higher support values for each value:

rules_f1 <- subset(rules, (rhs %in% "fallotrayec_f1") & support > 0.4 )
rules_f2 <- subset(rules, subset = (rhs %in% "fallotrayec_f2") & support > 0.18  )
rules_f3 <- subset(rules, subset = (rhs %in% "fallotrayec_f3") & support > 0.015  )

After that, I get the lhs side from each of these 3 subsets:

lhs_f1 <- lhs(rules_f1)
lhs_f2 <- lhs(rules_f2)
lhs_f3 <- lhs(rules_f3)

Finally, I wanna remove from rules_f1 all the rules with lhs equals to the ones found on rules_f2 and rules_f3, then I tried:

rules_f1_new <- subset(rules_f1, !(lhs %in% lhs_f2) | !(lhs %in% lhs_f3) )

But it keeps returning the following error:

Error in validObject(x, complete = TRUE) : 
invalid class “itemMatrix” object: item labels not unique

I'm using Rstudio v. 1.1.423 and R v. 3.4.3. Unfortunately the data I'm using is protected, but I think the above code can be simulated using one of the demo datasets. I've also loaded my dataset using rm.duplicates=TRUE. Thanks in advance.


Solution

  • Looks like you found a bug in the code. Execute the following to replace the faulty code and check if the output is then as expected.

    setMethod("%in%", signature(x = "itemMatrix", table = "itemMatrix"),
      function(x, table) !is.na(match(x, table))
    )
    

    This will be fixed in the next release of arules. Thanks for the example. It was very helpful in finding the problem.