Search code examples
python-3.xdata-miningapriorifpgrowth

Why does Apriori run faster than FP-Growth in this implementation?


I am using Christian Borlget's FP-Growth and Apriori packages to find frequent item sets and association rules. According to his paper, fp-growth performs better than apriori on all cases.

Running FP-Growth on my machine, on a ~36MB(~500,000 lines) csv file, shows:

from fim import apriori, fpgrowth
s = time.time()
fp = fpgrowth(tracts, target='r', supp=0.0065, zmin=2, report="C,S") # tracts is a list of lists
e = time.time()
print(e - s)

41.10438871383667

Whereas Apriori results in:

s = time.time()
ap = apriori(tracts, target='r', supp=0.0065, zmin=2, report="C,S")
e = time.time()
print(e - s)

34.50810647010803

What am I missing on the implementation?


Solution

  • There is no guarantee that either is always better than the other. Apriori can be very fast if no items satisfy the minimum support, for example. When your longest itemsets are 2 itemsets, a quite naive version can be fine. Apriori pruning as well as the fptree only begin to shine when you go for (more interesting!) longer itemsets, which may require choosing a low support parameter.