One-step vs. two-step assocation rule mining in arules - Why does it differ?

To my understanding the Apriori algorithm works by first finding all frequent itemsets that meet the support threshold and then generate strong association rules from the frequent itemset that also meet minimum confidence.

Hence I would expect that in the R package arules:

txs <- as(inputDataTable,"transactions") itemsets <- apriori(txs, parameter = list(support = 0.05, confidence = 0.7, target="frequent itemsets")) rules <- ruleInduction(itemsets)

and

txs <- as(inputDataTable,"transactions") rules <- apriori(txs, parameter = list(support = 0.05, confidence = 0.7, target="rules"))

would lead to the same rules, however more rules are found in the second example and I can't understand why.

Can anybody explain why this is? I'm trying to get my head around it for a while now..

Solution

Ok.. pretty straightforward now I know what was the problem.

For anybody who encounters a similar problem. The problem was that confidence should (of course) be set at the ruleInduction() step and not when finding all itemsets. Only support is relevant then. Because I didn't give a value for confidence at the ruleInduction() step, the default value for confidence of 0.8 was used and thus less rules were found.

So doing:

txs <- as(inputDataTable,"transactions") itemsets <- apriori(txs, parameter = list(support = 0.05, target="frequent itemsets")) rules <- ruleInduction(itemsets, confidence = 0.7)

and

txs <- as(inputDataTable,"transactions") rules <- apriori(txs, parameter = list(support = 0.05, confidence = 0.7, target="rules"))

Does lead to the same result. :)