python list algorithm performance data-structures

Efficient algorithm that returns a list of unique lists given a list of of lists as input

Given a python list of lists containing numbers i.e lists = [ [1, 2], [2, 1], [3, 4] ], the problem is to return a list of all unique lists from the input list. A list is considered a duplicate if it can be generated from another list by reordering the items in the list. i.e [2, 1] is a duplicate of [1, 2]. Given the input [ [1, 2], [2, 1], [3, 4] ], the output should be [ [1, 2], [3, 4]] . Any reordering of [ [1, 2], [3, 4]] is also correct i.e [1, 2], [4, 3]],

My approach is to first sort all lists in the input list, convert the lists to tuples, use a set data structure to filter out duplicate tuples, and finally convert the unique tuples back to lists. The time complexity for sorting all lists is O(m*nlogn) where m is the number of lists and n is the size of each list(assuming same size lists). Converting the lists to tuples takes O(mn) time, creating a set from the tuples takes O(mn), converting the set of unique tuples back to lists also takes O(mn) making the total time complexity O(mnlogn + mn + mn + mn) = O(mnlogn).

Can we do any better than O(mnlogn)?

Code:

def find_unique(lists):
  sorted_lists = [ sorted(lst) for lst in lists]
  tuples = [tuple(lst) for lst in sorted_lists]
  unique_tuples = set(tuples)
  return list(unique_tuples)

Solution

You can get an O(m*n) solution as long as the "key" you are using is O(m*n). This can be accomplished in two ways.

If the inner lists can't contain duplicates, then a set of frozen sets is an elegant solution. Note, frozenset(mylist) is O(n):

def unique(lists):
    seen = set()
    result = []
    for sub in lists:
        key = frozenset(sub)
        if key not in seen:
            result.append(sub)
            seen.add(key)
    return result

The above returns the first seen "unique" list that is actually in the input. If any order of a unique list is valid, even an order not seen in the original input (I presume this is the case because that is what your solution does) then perhaps more tersely:

def unique(lists):
    return list(map(list, set(map(frozenset, lists))))

If the inner lists can contain duplicates, then the above won't work, but you can use a collections.Counter which can act like a multiset, then use a frozent-set of the items in the counter:

from collections import Counter

def unique(lists):
    seen = set()
    result = []
    for sub in lists:
        key = frozenset(Counter(sub).items())
        if key not in seen:
            result.append(sub)
            seen.add(key)
    return result

Note, if n is smallish, I bet the sorted solution is faster.