Search code examples
pythonduplicatesdefaultdict

How to remove duplicate values from defaultdict(int) in python?


I have a dictionary which consists of unique key value pairs as follows:

    edu_bg={1: 1, 2: 2, 3: 1, 4: 1, 5: 2, 6: 2}

I want to create a dictionary from above data so that similar records (which has similar values) are grouped together as follows:

   {(1, 3): 1, (5, 6): 1, (1, 4): 1, (2, 6): 1,(2, 5): 1, (3, 4): 1})

I tried to achieve above output using following code:

    myedu = defaultdict(int)
    for k,v in edu_bg.iteritems():
        for K,V in edu_bg.iteritems():
          if K == k and V == v:
              pass
          if K != k and V == v:
              myedu[(k,K)] += 1
          else:
              pass

However it has resulted in duplicate records as follows:

         defaultdict(<type 'int'>, {(1, 3): 1, (5, 6): 1, (4, 1): 1, (3, 1): 1, (5, 2): 1, (1, 4): 1, (2, 6): 1, (4, 3): 1, (6, 2): 1, (2, 5): 1, (3, 4): 1, (6, 5): 1})

I want to remove these duplicate values. Any advice on this problem is appreciated.


Solution

  • Instead of iterating over the cartesian product of every pair, which will iterated over exactly n^2 elements, you could just iterate over every possible combination, which will iterate over n(n-1)/2 elements. While the Big Oh complexity will be the same, the constant factors will be reduced significantly:

    >>> from collections import defaultdict
    >>> from itertools import combinations
    >>> myedu = defaultdict(int)
    >>> edu_bg={1: 1, 2: 2, 3: 1, 4: 1, 5: 2, 6: 2}
    >>> for k1,k2 in combinations(edu_bg,2):
    ...   if edu_bg[k1] == edu_bg[k2]:
    ...     myedu[(k1,k2)] += 1
    ... 
    >>> myedu
    defaultdict(<class 'int'>, {(2, 6): 1, (1, 3): 1, (5, 6): 1, (2, 5): 1, (3, 4): 1, (1, 4): 1})
    >>> 
    

    I should reiterate, though, this sounds like the XY problem...