python pandas machine-learning data-mining orange

Problems while extractiong association rules with Orange?

I have a dataset with the dimensions (878049, 6).

It looks like this:

I would like to extract association rules that link the category column with the other columns. Thus, from the documentation I tried the following with Orange-Associate:

In:

import Orange
data = Orange.data.Table("data.csv")

In:

data.domain.attributes

Out:

   (DiscreteVariable('Category', values=['ARSON', 'ASSAULT', 'BAD CHECKS', 'BRIBERY', 'BURGLARY', ...]),
 DiscreteVariable('Descript', values=['ABANDONMENT OF CHILD', 'ABORTION', 'ACCESS CARD INFORMATION, PUBLICATION OF', 'ACCESS CARD INFORMATION, THEFT OF', 'ACCIDENTAL BURNS', ...]),
 DiscreteVariable('DayOfWeek', values=['Friday', 'Monday', 'Saturday', 'Sunday', 'Thursday', ...]),
 DiscreteVariable('PdDistrict', values=['BAYVIEW', 'CENTRAL', 'INGLESIDE', 'MISSION', 'NORTHERN', ...]),
 DiscreteVariable('Resolution', values=['ARREST, BOOKED', 'ARREST, CITED', 'CLEARED-CONTACT JUVENILE FOR MORE INFO', 'COMPLAINANT REFUSES TO PROSECUTE', 'DISTRICT ATTORNEY REFUSES TO PROSECUTE', ...]))

In:

from orangecontrib.associate.fpgrowth import *  

X, mapping = OneHot.encode(data, include_class=True)

X

Out:
array([[False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       ..., 
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False]], dtype=bool)

In:

 sorted(mapping.items())

Out:

[(0, (0, 0)),
 (1, (0, 1)),
 (2, (0, 2)),
 (3, (0, 3)),
 (4, (0, 4)),
 (5, (0, 5)),
 (6, (0, 6)),
 (7, (0, 7)),
....
 (950, (4, 15)),
 (951, (4, 16))]

Then:

In:

itemsets = dict(frequent_itemsets(X, .4))

len(itemsets)

Out:

1 

In:

 class_items = {item

                for item, var, _ in OneHot.decode(mapping, data, mapping)

                if var is data.domain.class_var}
In:
sorted(class_items)

Out:

[]

I believe that the problem is that I did not yield correctly the Orange table. Thus, How should I load the dataset with orange in order to generate association rules?.

update

By @K3---rnc answer I tried this:

itemsets = dict(frequent_itemsets(X, .1))

print (len(itemsets))

print( itemsets)

for itemset, _support in itemsets:

    print(' '.join('{}={}'.format(var.name, val)

                   for _, var, val in OneHot.decode(itemset, data, mapping)))

18
{frozenset({935}): 206403, frozenset({20}): 92304, frozenset({928}): 119908, frozenset({924}): 129211, frozenset({946}): 526790, frozenset({921}): 116707, frozenset({946, 932}): 93924, frozenset({919}): 121584, frozenset({932}): 157182, frozenset({21}): 126182, frozenset({922}): 125038, frozenset({16}): 174900, frozenset({929}): 105296, frozenset({918}): 133734, frozenset({16, 946}): 156586, frozenset({925}): 89431, frozenset({923}): 124965, frozenset({920}): 126810}

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-83-83a24c082126> in <module>()
      2 print (len(itemsets))
      3 print( itemsets)
----> 4 for itemset, _support in itemsets:
      5     print(' '.join('{}={}'.format(var.name, val)
      6                    for _, var, val in OneHot.decode(itemset, data, mapping)))

ValueError: not enough values to unpack (expected 2, got 1)

However, I still with the same issues... I can not extract the association rules.

Solution

You are trying to induce classification rules without having any class variable in your data domain. If you print data.domain, you will see you only have regular attributes and metas.

[Category, DayOfWeek, PdDistrict, Resolution] {Descript, Address}

To solve this, you need to set one of your attributes as a class variable.

new_domain = Orange.data.Domain(list(data.domain.attributes[1:]), 
             data.domain.attributes[0], 
             metas=data.domain.metas)

This will set 'Category' attribute as a class variable. Of course you can set your own class variable by the above example. If you now print new_domain, you should see something like this:

[DayOfWeek, PdDistrict, Resolution | Category] {Descript, Address}