I have implemented Apriori algorithm on my dataset. The rules I get though are inverted repititions that is:
inspect(head(rules))
lhs rhs support confidence lift count
[1] {252-ON-OFF} => {L30-ATLANTIC} 0.04545455 1 22 1
[2] {L30-ATLANTIC} => {252-ON-OFF} 0.04545455 1 22 1
[3] {252-ON-OFF} => {M01-A molle biconiche} 0.04545455 1 22 1
[4] {M01-A molle biconiche} => {252-ON-OFF} 0.04545455 1 22 1
[5] {L30-ATLANTIC} => {M01-A molle biconiche} 0.04545455 1 22 1
[6] {M01-A molle biconiche} => {L30-ATLANTIC} 0.04545455 1 22 1
As can be seen rule 1 & rule 2 are the same just the LHS & RHS are interchanged. Is there any way to remove such rules from the final result?
I saw this post link but the proposed solution is not correct. I also saw this post link and I tried this 2 solutions:
solution A:
rules <- rules[!is.redundant(rules)]
but the result is always the same:
inspect(head(rules))
lhs rhs support confidence lift count
[1] {252-ON-OFF} => {L30-ATLANTIC} 0.04545455 1 22 1
[2] {L30-ATLANTIC} => {252-ON-OFF} 0.04545455 1 22 1
[3] {252-ON-OFF} => {M01-A molle biconiche} 0.04545455 1 22 1
[4] {M01-A molle biconiche} => {252-ON-OFF} 0.04545455 1 22 1
[5] {L30-ATLANTIC} => {M01-A molle biconiche} 0.04545455 1 22 1
[6] {M01-A molle biconiche} => {L30-ATLANTIC} 0.04545455 1 22 1
Solution B:
# find redundant rules
subset.matrix <- is.subset(rules, rules)
subset.matrix[lower.tri(subset.matrix, diag=T)]
redundant <- colSums(subset.matrix, na.rm=T) > 1
which(redundant)
rules.pruned <- rules[!redundant]
inspect(rules.pruned)
lhs rhs support confidence lift count
[1] {} => {BRC-BRC} 0.04545455 0.04545455 1 1
[2] {} => {111-WINK} 0.04545455 0.04545455 1 1
[3] {} => {305-INGRAM HIGH} 0.04545455 0.04545455 1 1
[4] {} => {952-REVERS} 0.04545455 0.04545455 1 1
[5] {} => {002-LC2} 0.09090909 0.09090909 1 2
[6] {} => {252-ON-OFF} 0.04545455 0.04545455 1 1
[7] {} => {L30-ATLANTIC} 0.04545455 0.04545455 1 1
[8] {} => {M01-A molle biconiche} 0.04545455 0.04545455 1 1
[9] {} => {678-Portovenere} 0.04545455 0.04545455 1 1
[10] {} => {251-MET T.} 0.04545455 0.04545455 1 1
[11] {} => {324-D.S.3} 0.04545455 0.04545455 1 1
[12] {} => {L04-YUME} 0.04545455 0.04545455 1 1
[13] {} => {969-Lubekka} 0.04545455 0.04545455 1 1
[14] {} => {000-FUORI LISTINO} 0.04545455 0.04545455 1 1
[15] {} => {007-LC7} 0.04545455 0.04545455 1 1
[16] {} => {341-COS} 0.04545455 0.04545455 1 1
[17] {} => {601-ROBIE 1} 0.04545455 0.04545455 1 1
[18] {} => {608-TALIESIN 2} 0.04545455 0.04545455 1 1
[19] {} => {610-ROBIE 2} 0.04545455 0.04545455 1 1
[20] {} => {615-HUSSER} 0.04545455 0.04545455 1 1
[21] {} => {831-DAKOTA} 0.04545455 0.04545455 1 1
[22] {} => {997-997} 0.27272727 0.27272727 1 6
[23] {} => {412-CAB} 0.09090909 0.09090909 1 2
[24] {} => {S01-A doghe senza movimenti} 0.09090909 0.09090909 1 2
[25] {} => {708-Genoa} 0.09090909 0.09090909 1 2
[26] {} => {998-998} 0.54545455 0.54545455 1 12
Has anyone had the same problem and knows how to solve it? Thanks for your help
The issue is your dataset, not the algorithm. In the result, you see that the count of many rules is 1 (item combination occurs once in the transactions) and confidence is 1 for the rule and its "inverse." This means that you need more data and increase the minimum support.
If you still want to get rid of such "duplicate" rules efficiently, then you can do the following:
> library(arules)
> data(Groceries)
> rules <- apriori(Groceries, parameter = list(support = 0.001))
> rules
set of 410 rules
> gi <- generatingItemsets(rules)
> d <- which(duplicated(gi))
> rules[-d]
set of 385 rules
The code only keeps the first rule of each set of rules with exactly the same items.