Search code examples
algorithmpython-2.7sequencesfpm

Extract specific object from a list of sequences in python


I implemented fpm algorithm to find the rules from the activity data, I have the output data in the format.

for itemset in find_frequent_itemsets(dataset, 0.1,include_support=True):
    print itemset

The following is the output of the above code:

([u'Global Connect Village'], 28)
([u'Terminal 2', u'Global Connect Village'], 1)
([u'VivoCity', u'Global Connect Village'], 1)
([u'Universal Studios Singapore', u'VivoCity', u'Global Connect Village'], 1)
([u'Universal Studios Singapore', u'Global Connect Village'], 2)
([u'Orchard Gateway', u'Global Connect Village'], 2)
([u'Chinatown', u'Global Connect Village'], 2)
([u'Singapore Changi Airport (SIN)', u'Chinatown', u'Global Connect Village'], 2)
([u'Fragrance Hotel', u'Global Connect Village'], 2)
([u'Singapore Changi Airport (SIN)', u'Fragrance Hotel', u'Global Connect Village'], 1)
([u'Singapore', u'Global Connect Village'], 3)
([u'Singapore Changi Airport (SIN)', u'Singapore', u'Global Connect Village'], 1)
([u"McDonald's", u'Global Connect Village'], 4)
([u'Singapore Changi Airport (SIN)', u"McDonald's", u'Global Connect Village'], 1)

I want to extract only those values which are having higher support and contains three or more objects.


Solution

  • Just use filter and sorted:

    MIN_LOCS = 3
    itemset = find_frequent_itemsets(dataset, 0.1,include_support=True
    itemset = sorted(filter(lambda it: len(it[0]) >= MIN_LOCS, itemset), key=lambda it: it[1])
    

    Then you can pick the top elements you want:

    itemset_top_5 = itemset[:5]
    

    If you want to include a minimum support value just adapt the filtering as needed:

    itemset = sorted(filter(lambda it: len(it[0]) >= MIN_LOCS and it[1] >= MIN_SUPPORT, itemset),
                     key=lambda it: it[1])