The simple Maths of union(setA,setB)= setA + setB - intersect(setA,setB) is not valid
What am I missing here?
This is the summary of my two rules sets.
> setA
set of 625 rules
> setB
set of 622 rules
> union(setA,setB)
set of 626 rules
> intersect(setA,setB)
set of 174 rules
> setdiff(setA,setB)
set of 451 rules
> setdiff(setB,setA)
set of 448 rules
Exported Rules
RModel
This is a tricky problem.
load("setA.Rdata")
load("setB.Rdata")
all.equal(itemLabels(setA), itemLabels(setB))
[1] "Lengths (261, 263) differ (string compare on first 261)"
[2] "167 string mismatches"
You have two rule sets that use different item encodings (i.e., a different order for the items). This happens if you mine them from different datasets and do not take care that the item encoding is the same.
arules expects that the sets are encoded in the same way without checking. I think a check needs to be added.
You can fix your rule sets by recoding them to use the same itemLabels:
itemLabels <- union(itemLabels(setA), itemLabels(setB))
setA_fixed <- new("rules",
lhs = recode(lhs(setA), itemLabels = itemLabels),
rhs = recode(rhs(setA), itemLabels = itemLabels)
)
setB_fixed <- new("rules",
lhs = recode(lhs(setB), itemLabels = itemLabels),
rhs = recode(rhs(setB), itemLabels = itemLabels)
)
Now you get the expected result:
length(union(setA_fixed, setB_fixed))
[1] 626
length(c(setA_fixed, setB_fixed)) - length(intersect(setA_fixed, setB_fixed))
[1] 626