Search code examples
raprioriarules

The union and intersection of rules in Arules does arithmetically make sense


The simple Maths of union(setA,setB)= setA + setB - intersect(setA,setB) is not valid

What am I missing here?

This is the summary of my two rules sets.

> setA
set of 625 rules
 
> setB
set of 622 rules 

> union(setA,setB)
set of 626 rules 

> intersect(setA,setB)
set of 174 rules 

> setdiff(setA,setB)
set of 451 rules 

> setdiff(setB,setA)
set of 448 rules 
 

Exported Rules

setA

setB

RModel

setA.Rdata

setB.Rdata


Solution

  • This is a tricky problem.

    load("setA.Rdata")
    load("setB.Rdata")
    
    all.equal(itemLabels(setA), itemLabels(setB))   
    [1] "Lengths (261, 263) differ (string compare on first 261)"
    [2] "167 string mismatches" 
    

    You have two rule sets that use different item encodings (i.e., a different order for the items). This happens if you mine them from different datasets and do not take care that the item encoding is the same.

    arules expects that the sets are encoded in the same way without checking. I think a check needs to be added.

    You can fix your rule sets by recoding them to use the same itemLabels:

    itemLabels <- union(itemLabels(setA), itemLabels(setB))
    
    setA_fixed <- new("rules",
      lhs = recode(lhs(setA), itemLabels = itemLabels), 
      rhs = recode(rhs(setA), itemLabels = itemLabels)
      )
    
    setB_fixed <- new("rules",
      lhs = recode(lhs(setB), itemLabels = itemLabels), 
      rhs = recode(rhs(setB), itemLabels = itemLabels)
    )
    

    Now you get the expected result:

    length(union(setA_fixed, setB_fixed))
    [1] 626
    length(c(setA_fixed, setB_fixed)) - length(intersect(setA_fixed, setB_fixed))
    [1] 626