Search code examples
pythonlistpositionduplicatesnested-loops

Finding duplicates in few lists


In my case duplicate is not a an item that reappear in one list, but also in the same positions on another lists. For example:

list1 = [1,2,3,3,3,4,5,5]
list2 = ['a','b','b','c','b','d','e','e']
list3 = ['T1','T2','T3','T4','T3','T4','T5','T5']

So the position of the real duplicates in all 3 lists is [2,4] and [6,7]. Because in list1 3 is repeated, in list2 'b' is repeated in the same position as in list1, in list 3 'T3'. in second case 5,e,T5 represent duplicated items in the same positions in their lists. I have a hard time to present results "automatically" in one step.

1) I find duplicate in first list

# Find Duplicated part numbers (exact maches)
def list_duplicates(seq):
  seen = set()
  seen_add = seen.add
  # adds all elements it doesn't know yet to seen and all other to seen_twice
  seen_twice = set( x for x in seq if x in seen or seen_add(x) )
  # turn the set into a list (as requested)
  return list(seen_twice)
# List of Duplicated part numbers
D_list1 = list_duplicates(list1)
D_list2 = list_duplicates(list2)

2) Then I find the positions of given duplicate and look at that position in second list

# find the row position of duplicated part numbers
def list_position_duplicates(list1,n,D_list1):
    position = []    
    gen = (i for i,x in enumerate(data) if x == D_list1[n])
    for i in gen: position.append(i)
    return position    

# Actual calculation find the row position of duplicated part numbers, beginning and end 
lpd_part = list_position_duplicates(list1,1,D_list1)
start = lpd_part[0]
end = lpd_part[-1]

lpd_parent = list_position_duplicates(list2[start:end+1],0,D_list2)

So in step 2 I need to put n (position of found duplicate in the list), I would like to do this step automatically, to have a position of duplicated elements in the same positions in the lists. For all duplicates in the same time, and not one by one "manualy". I think it just need a for loop or if, but I'm new to Python and I tried many combinations and it didn't work.


Solution

  • You can use items from all 3 lists on the same index as key and store the the corresponding index as value(in a list). If for any key there are more than 1 indices stored in the list, it is duplicate:

    from itertools import izip
    
    def solve(*lists):
      d = {}
      for i, k in enumerate(izip(*lists)):
        d.setdefault(k, []).append(i)
      for k, v in d.items():
        if len(v) > 1:
          print k, v
    
    solve(list1, list2, list3)
    #(3, 'b', 'T3') [2, 4]
    #(5, 'e', 'T5') [6, 7]