Search code examples
wekadata-miningapriori

How to find frequent itemset irrespective of attribute name?


I have a dataset (CSV file) to find frequent itemsets using Apriori algorithm.

col1, col2, col3
bread, butter,?
coke, bread, butter

I am using WEKA for this purpose. The ouput is in the following format:

...
Large Itemsets L(2):
col1=bread  col2= butter 1
col1=coke  col2= bread 1
col1=coke  col3= butter 1
col2= bread  col3= butter 1
...

But the output that I am want is :

bread, butter 2

Basically, the above output is independent of the col that they belong to. How can I achieve this kind of output?


Solution

  • Format your data differently.

    Weka expects columns to be the same products, and the value to be t/f (for true, false). Then you get itemset of the kind milk=t -> butter=t.

    See the .arff examples included with Weka.

    I think I saw an ELKI example using your input format.