Search code examples
pythonstringlistcomparison

I only find a fraction of the existing duplicates in two Lists of Strings


This method receives lists of String as parameters and is supposed to return all Strings in uncheckeds that do not have a match in valids. (The Strings are e-mail addresses)

If I input two Lists that are exactly the same, some of the duplicates are found, but not all of them. This behaviour is consistent if i input two unique Lists. Some duplicates are found and removed, some are not.

I have found similar questions answered, but i cant find information regarding this specific problem.

My method:

def getUnmatchedAddresses(valids, uncheckeds):
    
    unmatcheds = uncheckeds

    for unchecked in uncheckeds:
        for valid in valids:
            if(unchecked == valid):
                unmatcheds.remove(valid)
                
    return unmatcheds

Solution

  • The issue comes from unmatcheds = uncheckeds which do not perform a deep copy: it only copies the reference. As a result, the loop iterate on a mutating list resulting in your issue. While you can copy the list before iterating over it, this would be clearly inefficient. You can just build directly the new one efficiently using the following code:

    def getUnmatchedAddresses(valids, uncheckeds):
        return [unchecked for unchecked in uncheckeds 
                   if not any(unchecked == valid for valid in valids)]