Search code examples
rdata-miningrulesarules

Removing inverted (reverse/duplicate) rules from Apriori result in R


I have implemented Apriori algorithm on my dataset. The rules I get though are inverted repititions that is:

inspect(head(rules))
    lhs                        rhs                     support    confidence lift count
[1] {252-ON-OFF}            => {L30-ATLANTIC}          0.04545455 1          22   1    
[2] {L30-ATLANTIC}          => {252-ON-OFF}            0.04545455 1          22   1    
[3] {252-ON-OFF}            => {M01-A molle biconiche} 0.04545455 1          22   1    
[4] {M01-A molle biconiche} => {252-ON-OFF}            0.04545455 1          22   1    
[5] {L30-ATLANTIC}          => {M01-A molle biconiche} 0.04545455 1          22   1    
[6] {M01-A molle biconiche} => {L30-ATLANTIC}          0.04545455 1          22   1 

As can be seen rule 1 & rule 2 are the same just the LHS & RHS are interchanged. Is there any way to remove such rules from the final result?

I saw this post link but the proposed solution is not correct. I also saw this post link and I tried this 2 solutions:

solution A:

rules <- rules[!is.redundant(rules)]

but the result is always the same:

inspect(head(rules))
    lhs                        rhs                     support    confidence lift count
[1] {252-ON-OFF}            => {L30-ATLANTIC}          0.04545455 1          22   1    
[2] {L30-ATLANTIC}          => {252-ON-OFF}            0.04545455 1          22   1    
[3] {252-ON-OFF}            => {M01-A molle biconiche} 0.04545455 1          22   1    
[4] {M01-A molle biconiche} => {252-ON-OFF}            0.04545455 1          22   1    
[5] {L30-ATLANTIC}          => {M01-A molle biconiche} 0.04545455 1          22   1    
[6] {M01-A molle biconiche} => {L30-ATLANTIC}          0.04545455 1          22   1 

Solution B:

# find redundant rules
subset.matrix <- is.subset(rules, rules)
subset.matrix[lower.tri(subset.matrix, diag=T)]
redundant <- colSums(subset.matrix, na.rm=T) > 1
which(redundant)
rules.pruned <- rules[!redundant]
inspect(rules.pruned)
     lhs    rhs                           support    confidence lift count
[1]  {}  => {BRC-BRC}                     0.04545455 0.04545455 1     1   
[2]  {}  => {111-WINK}                    0.04545455 0.04545455 1     1   
[3]  {}  => {305-INGRAM HIGH}             0.04545455 0.04545455 1     1   
[4]  {}  => {952-REVERS}                  0.04545455 0.04545455 1     1   
[5]  {}  => {002-LC2}                     0.09090909 0.09090909 1     2   
[6]  {}  => {252-ON-OFF}                  0.04545455 0.04545455 1     1   
[7]  {}  => {L30-ATLANTIC}                0.04545455 0.04545455 1     1   
[8]  {}  => {M01-A molle biconiche}       0.04545455 0.04545455 1     1   
[9]  {}  => {678-Portovenere}             0.04545455 0.04545455 1     1   
[10] {}  => {251-MET T.}                  0.04545455 0.04545455 1     1   
[11] {}  => {324-D.S.3}                   0.04545455 0.04545455 1     1   
[12] {}  => {L04-YUME}                    0.04545455 0.04545455 1     1   
[13] {}  => {969-Lubekka}                 0.04545455 0.04545455 1     1   
[14] {}  => {000-FUORI LISTINO}           0.04545455 0.04545455 1     1   
[15] {}  => {007-LC7}                     0.04545455 0.04545455 1     1   
[16] {}  => {341-COS}                     0.04545455 0.04545455 1     1   
[17] {}  => {601-ROBIE 1}                 0.04545455 0.04545455 1     1   
[18] {}  => {608-TALIESIN 2}              0.04545455 0.04545455 1     1   
[19] {}  => {610-ROBIE 2}                 0.04545455 0.04545455 1     1   
[20] {}  => {615-HUSSER}                  0.04545455 0.04545455 1     1   
[21] {}  => {831-DAKOTA}                  0.04545455 0.04545455 1     1   
[22] {}  => {997-997}                     0.27272727 0.27272727 1     6   
[23] {}  => {412-CAB}                     0.09090909 0.09090909 1     2   
[24] {}  => {S01-A doghe senza movimenti} 0.09090909 0.09090909 1     2   
[25] {}  => {708-Genoa}                   0.09090909 0.09090909 1     2   
[26] {}  => {998-998}                     0.54545455 0.54545455 1    12 

Has anyone had the same problem and knows how to solve it? Thanks for your help


Solution

  • The issue is your dataset, not the algorithm. In the result, you see that the count of many rules is 1 (item combination occurs once in the transactions) and confidence is 1 for the rule and its "inverse." This means that you need more data and increase the minimum support.

    If you still want to get rid of such "duplicate" rules efficiently, then you can do the following:

    > library(arules)
    > data(Groceries)
    > rules <- apriori(Groceries, parameter = list(support = 0.001))
    > rules
    set of 410 rules
    
    > gi <- generatingItemsets(rules)
    > d <- which(duplicated(gi))
    > rules[-d]
    set of 385 rules 
    

    The code only keeps the first rule of each set of rules with exactly the same items.