I am generating rules from my data and one thing I noticed were a few duplicated rules. These rules have the same support, lift and count values but different confidence and coverage values.
I initially thought this was due to a white space in one of the product names but I have trimmed and cleaned the product info before mining for rules.
#GENERATE RULES
rules1 <- apriori(transactions,
parameter = list(
sup = supportLevels[3],
conf = confidenceLevels[9],
minlen = 2,
target = "rules"
)
)
# VIEW THE ASSOCIATION RULES
inspect(sort(rules1,
by = "lift", # sort by strongests to weakest rules
decreasing = TRUE))
Below you can see the first two rules which are duplicated/symmetrical but have different confidence values.
Unfortunately I can not share my dataset as it's proprietary and I could not replicate with the Groceries dataset in Arules.
Does anyone have an idea why I could get different confidence but same support and lift for these rules?
This follows directly from the definition of the measures for two rules
X => Y
Y => X
which are both created from the same frequent item set given by the union of X and Y.
supp(X => Y) = supp(Y => X) = supp(X and Y)
lift(X => Y) = lift(X => Y)
supp(X)
is different from supp(Y)
, then conf(X => Y)
will be different from conf(Y => X)
.