Given a python list of lists containing numbers i.e lists = [ [1, 2], [2, 1], [3, 4] ]
, the problem is to return a list of all unique lists from the input list. A list is considered a duplicate if it can be generated from another list by reordering the items in the list. i.e [2, 1]
is a duplicate of [1, 2].
Given the input [ [1, 2], [2, 1], [3, 4] ]
, the output should be [ [1, 2], [3, 4]]
. Any reordering of [ [1, 2], [3, 4]]
is also correct i.e [1, 2], [4, 3]],
My approach is to first sort all lists in the input list, convert the lists to tuples, use a set data structure to filter out duplicate tuples, and finally convert the unique tuples back to lists. The time complexity for sorting all lists is O(m*nlogn)
where m is the number of lists and n is the size of each list(assuming same size lists). Converting the lists to tuples takes O(mn)
time, creating a set from the tuples takes O(mn)
, converting the set of unique tuples back to lists also takes O(mn)
making the total time complexity O(mnlogn + mn + mn + mn) = O(mnlogn)
.
Can we do any better than O(mnlogn)
?
Code:
def find_unique(lists):
sorted_lists = [ sorted(lst) for lst in lists]
tuples = [tuple(lst) for lst in sorted_lists]
unique_tuples = set(tuples)
return list(unique_tuples)
You can get an O(m*n) solution as long as the "key" you are using is O(m*n). This can be accomplished in two ways.
If the inner lists can't contain duplicates, then a set of frozen sets is an elegant solution. Note, frozenset(mylist)
is O(n):
def unique(lists):
seen = set()
result = []
for sub in lists:
key = frozenset(sub)
if key not in seen:
result.append(sub)
seen.add(key)
return result
The above returns the first seen "unique" list that is actually in the input. If any order of a unique list is valid, even an order not seen in the original input (I presume this is the case because that is what your solution does) then perhaps more tersely:
def unique(lists):
return list(map(list, set(map(frozenset, lists))))
If the inner lists can contain duplicates, then the above won't work, but you can use a collections.Counter
which can act like a multiset, then use a frozent-set of the items in the counter:
from collections import Counter
def unique(lists):
seen = set()
result = []
for sub in lists:
key = frozenset(Counter(sub).items())
if key not in seen:
result.append(sub)
seen.add(key)
return result
Note, if n
is smallish, I bet the sorted
solution is faster.