Search code examples
pythonlistfor-looplist-comprehension

Comportement of list comprehension with self reference


I'm retrieving a list of (name, id) pairs and I need to make sure there's no duplicate of name, regardless of the id.

# Sample data
filesID = [{'name': 'file1', 'id': '353'}, {'name': 'file2', 'id': '154'},
           {'name': 'file3', 'id': '1874'}, {'name': 'file1', 'id': '14'}]

I managed to get the desired output with nested loops:

uniqueFilesIDLoops = []
for pair in filesID:
    found = False
    for d in uniqueFilesIDLoops:
        if d['name'] == pair['name']:
            found = True
    if not found:
        uniqueFilesIDLoops.append(pair)

But I can't get it to work with list comprehension. Here's what I've tried so far:

uniqueFilesIDComprehension = []
uniqueFilesIDComprehension = [
    pair for pair in filesID if pair['name'] not in [
        d['name'] for d in uniqueFilesIDComprehension
    ]
]

Outputs:

# Original data
[{'name': 'file1', 'id': '353'}, {'name': 'file2', 'id': '154'},
 {'name': 'file3', 'id': '1874'}, {'name': 'file1', 'id': '14'}]

# Data obtained with list comprehension
[{'name': 'file1', 'id': '353'}, {'name': 'file2', 'id': '154'},
 {'name': 'file3', 'id': '1874'}, {'name': 'file1', 'id': '14'}]

# Data obtained with loops (and desired output)
[{'name': 'file1', 'id': '353'}, {'name': 'file2', 'id': '154'},
 {'name': 'file3', 'id': '1874'}]

I was thinking that maybe the call to uniqueFilesIDComprehension inside the list comprehension was not updated at each iteration, thus using [] and not finding corresponding values.


Solution

  • You cannot access contents of list comprehension during its creation, because it will be assigned to anything only after its value is completely evaluated.

    Simpliest way to remove duplicates would be:

    list({el['name'] : el for el in filesID}.values()) - this will create a dictionary based on the names of each element, so every time you encounter duplicate name it will overwrite it with a new element. After the dict is created all you need to do is get the values and cast it to list. If you want to keep the first element with each name, not the last you can instead do it by creating the dictionary in a for loop:

    out = {}
    for el in filesID:
        if el['name'] not in out:
            out[el['name']] = el
    

    And finally, one thing to consider when implementing any of those solutions - since you do not care about id part, do you really need to extract it?

    I'd ask myself if this is not a valid choice as well.

    out = {el['name'] for el in filesID}
    print(out)
    

    Output: {'file1', 'file3', 'file2'}