Search code examples
pythonduplicatesunique

Python - separate duplicate objects into different list


So let say I have this class:

class Spam(object):
    def __init__(self, a):
        self.a = a

And now I have these objects:

s1 = Spam((1, 1, 1, 4))

s2 = Spam((1, 2, 1, 4))

s3 = Spam((1, 2, 1, 4))

s4 = Spam((2, 2, 1, 4))

s5 = Spam((2, 1, 1, 8))

s6 = Spam((2, 1, 1, 8))

objects = [s1, s2, s3, s4, s5, s6]

so after running some kind of method, I need to have two lists that have objects that had same a attribute value in one list and the other objects that had unique a attribute.

Like this:

dups = [s2, s3, s5, s6]
normal = [s1, s4]

So it is something like getting duplicates, but in addition it should also add even first occurrence of object that shares same a attribute value.

I have written this method and it seems to be working, but it is quite ugly in my opinion (and probably not very optimal).

def eggs(objects):
    vals = []
    dups = []
    normal = []
    for obj in objects:
        if obj.a in vals:
            dups.append(obj)
        else:
            normal.append(obj)
            vals.append(obj.a)
    dups_vals = [o.a for o in dups]
    # separate again
    new_normal = []
    for n in normal:
        if n.a in dups_vals:
            dups.append(n)
        else:
            new_normal.append(n)
    return dups, new_normal

Can anyone write more appropriate pythonic approach for such problem?


Solution

  • I would group together the objects in a dictionary, using the a attribute as the key. Then I would separate them by the size of the groups.

    import collections
    
    def separate_dupes(seq, key_func):
        d = collections.defaultdict(list)
        for item in seq:
            d[key_func(item)].append(item)
        dupes   = [item for v in d.values() for item in v if len(v) > 1]
        uniques = [item for v in d.values() for item in v if len(v) == 1]
        return dupes, uniques
    
    class Spam(object):
        def __init__(self, a):
            self.a = a
        #this method is not necessary for the solution, just for displaying the results nicely
        def __repr__(self):
            return "Spam({})".format(self.a)
    
    s1 = Spam((1, 1, 1, 4))
    s2 = Spam((1, 2, 1, 4))
    s3 = Spam((1, 2, 1, 4))
    s4 = Spam((2, 2, 1, 4))
    s5 = Spam((2, 1, 1, 8))
    s6 = Spam((2, 1, 1, 8))
    objects = [s1, s2, s3, s4, s5, s6]
    
    dupes, uniques = separate_dupes(objects, lambda item: item.a)
    print(dupes)
    print(uniques)
    

    Result:

    [Spam((2, 1, 1, 8)), Spam((2, 1, 1, 8)), Spam((1, 2, 1, 4)), Spam((1, 2, 1, 4))]
    [Spam((1, 1, 1, 4)), Spam((2, 2, 1, 4))]