I just realized that the performance of Frequent Itemsets is strongly correlated with number of item per basket. I run the following code:
import datetime
from datetime import datetime
from orangecontrib.associate.fpgrowth import *
%%time
T = [[1,3, 4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21]]
itemsets = frequent_itemsets(T, 1)
a=list(itemsets)
As I increased the number of item in T the running time increased as follow:
Item running time
21 3.39s
22 9.14s
23 15.8s
24 37.4s
25 1.2 min
26 10 min
27 35 min
28 95 min
For 31 item sets it took 10 hours without returning any results. I am wondering if there is anyway to run it for more than 31 items in reasonable time? In this case I just need pairwise item set (A-->B) while my understanding is frequent_itemsets count all possible combination and that is probably why it is running time is highly correlated with number of items. Is there any way to tell the method to limit the number item to count like instead of all combination just pairwise?
You could use other software that allows to specify constraints on the itemsets such as a length constraints. For example, you can consider the SPMF data mining library (disclosure: I am the founder), which offers about 120 algorithms for itemset and pattern mining. It will let you use FPGrowth with a length constraint. So you could for example mine only the patterns with 2 items or 3 items. You could also try other features such as mining association rules. That software works on text file and can be called for the command line, and is quite fast.