Search code examples
python-2.7key-valueintersectiondefaultdict

Intersection of values in different combinations of multiple dictionaries (default dicts)


I am trying to build a table from a dataframe in python that shows the total of common frequencies of words that between two particular categories. In order to do this, I have built first a default dict that contains each category as the key and the list of words that pertain to that category at the value.

Now, I need to for each combination of two categories build a table that demonstrates the commonalities for a final desired result table such as:

  A B C
A 10 2 1
B 2 5 2
C 1 2 3

The sample data that I am working with is as:

Cat Item
A dog
A cat
A bear
A fish
A monkey
A tiger
A lion
A rabbit
A horse
A turtle
B dog
B cat
B flower
B plant
B bush
C dog
C flower
C plant

The working code that I am using is:

import pandas as pd
import numpy as np
from collections import defaultdict


inFile = '\path\to\infile.csv'

data = pd.read_csv(inFile, sep='\t')
dicts = defaultdict(list)

for i, j in zip(data['Cat'],data['Item']):
    dicts[i].append(j)


for k,v in dicts.iteritems():
    set1 = set(v)
    set2 = set(v)
    for k in set1.intersection(set2):
        print k,v

After running the above: the result default dict (before intersection) is the following

{'A':['dog','cat','bear','fish','monkey','tiger','lion','rabbit','horse','turtle'],'B':['dog','cat','flower','plant','bush'],'C':['dog','flower','plant']}

Trying to research this problem, I came across the following solution:, which is a step in the right direction, as it is counting and grouping values according to keys in multple dics, however it does not take into account the union of values between each combination of keys of the dict.

I also have looked at some solutions for find matching keys or values, but the majority of them, such as HERE, only deal with instances of two dictionaries and not multiple dictionaries.

Thus, I am still stuck in how to count and sum the total of common elements between each combination of keys within MULTIPLE dicts.


Solution

  • I have made a dictionary required, you can format its data into a table: Use the & operator for intersection, that's exactly what you need :-

    >>> dicts = {'A':['dog','cat','bear','fish','monkey','tiger','lion','rabbit','horse','turtle'],'B':['dog','cat','flower','plant','bush'],'C':['dog','flower','plant']}
    >>> dicts.items()
    [('A', ['dog', 'cat', 'bear', 'fish', 'monkey', 'tiger', 'lion', 'rabbit', 'horse', 'turtle']), ('C', ['dog', 'flower', 'plant']), ('B', ['dog', 'cat', 'flower', 'plant', 'bush'])]
    >>> dicts = {'A':['dog','cat','bear','fish','monkey','tiger','lion','rabbit','horse','turtle'],'B':['dog','cat','flower','plant','bush'],'C':['dog','flower','plant']}
    >>> items = sorted(dicts.items())
    >>> res = {}
    >>> for i in range(len(items)) :
    ...     for j in range(i,len(items)) :
    ...             res[(items[i][0],items[j][0])] = len(set(items[i][1]) & set(items[j][1]))
    ...             res[(items[j][0],items[i][0])] = res[(items[i][0],items[j][0])]
    ...
    >>> res
    {('B', 'C'): 3, ('A', 'A'): 10, ('B', 'B'): 5, ('B', 'A'): 2, ('C', 'A'): 1, ('C', 'B'): 3, ('C', 'C'): 3, ('A', 'B'): 2, ('A', 'C'): 1}
    >>>