pythonalgorithmperformancelisttime-complexity

Efficiently finding duplicates in a list

I have the function below which searches an array for duplicate entries and then returns a list of the duplicates. I'd like to speed this piece of my code up, can anyone suggest a more efficient way?

Code:

``````def findDupe(array):
dupelist = []
for i in range(len(array)):
for j in range(len(array)):
comp1 = array[i]
comp2 = array[j]
if comp1 == comp2 and i!=j:
if comp2 not in dupelist:
dupelist.append(comp2)
return dupelist
``````

Solution

• The idea here is to do a single sweep in linear time. You can use a counter to do this. The idea is to count each element and then return all those which were counted multiple times.

``````from collections import Counter

def get_duplicates(array):
c = Counter(array)
return [k for k in c if c[k] > 1]
``````

The approach above is linear in complexity, but makes two passes over the input - once, to count (this is abstracted away by the `Counter` constructor) and the other is to return non-unique values in the list comp. You can optimise this a bit using a `yield` statement, and return duplicates as you find them.

``````def get_duplicates(array):
c = Counter()
seen = set()
for i in array:
c[i] += 1
if c[i] > 1 and i not in seen:
This results in a compulsory `if` check and extra space in the form of a `set`, but you reduce two passes to one.