Search code examples
pythonlistunique

find, collect duplicates in list of lists/sets


In Python, I have a list of tuples and a list of integers with the same length, e.g.,

a = [
    [1, 2],
    [3, 2],
    [4, 66],
    [2, 3]
    ]

b = [
    1,
    31,
    31,
    44
    ]

The k-th entry in a can thought of as being associated with the k-th entry in b.

The entries [3, 2] and [2, 3] are really the same for me, and I'd like a uniquified with that in mind. Also, I would like a list of entries of belonging to the new unique list. For the above example,

a2 = [
    [1, 2],
    [3, 2],  # or [2, 3]
    [4, 66]
    ]

b2 = [
    [1],
    [31, 44],
    [31]
    ]

b2[0] is [1] since [1, 2] is associated with only 1. b2[1] is [31, 44] since [2, 3] (which equals [3, 2] is associated with 31 and 44 in a.

It's possible to go through a entry by entry, make each 2-list a frozenset, sort it into a dictionary etc. Needless to say, this doesn't perform very well if a and b are large.

Any hints on how to handle this smarter? (List comprehensions?)


Solution

  • if you want to maintain order and group I don't think you won't get much better than grouping with an OrderedDict:

    from collections  import OrderedDict
    a = [
        [1, 2],
        [3, 2],
        [4, 66],
        [2, 3]
        ]
    
    b = [1, 31, 31, 44]
    d = OrderedDict()
    for ind, f in enumerate(map(frozenset, a)):
            d.setdefault(f, []).append(b[ind])
    
    print(list(d), list(d.values()))
    

    Which would give you:

    [frozenset({1, 2}), frozenset({2, 3}), frozenset({66, 4})] [[1], [31, 44], [31]]
    

    if order seen is irrelevant, use a defaultdict:

    from collections  import defaultdict
    a = [
        [1, 2],
        [3, 2],
        [4, 66],
        [2, 3]
        ]
    
    b = [1, 31, 31, 44]
    d = defaultdict(list)
    for ind, f in enumerate(map(frozenset, a)):
            d[f].append(b[ind])
    
    print(list(d), list(d.values()))
    

    Which would give you:

     [frozenset({1, 2}), frozenset({2, 3}), frozenset({66, 4})] [[1], [31, 44], [31]]
    

    If you really want lists or tuples:

    print(list(map(list, d)), list(d.values()))
    

    Which would give you:

    [[1, 2], [2, 3], [66, 4]] [[1], [31, 44], [31]]
    

    For python2, you should use itertools.izip and itertools.imap in place of map and zip.