Search code examples
pythonperformanceextractpython-itertools

Efficiently extracting set of unique values from lists within a dictionary


I have a data structure which looks like this:

{'A': [2, 3, 5, 6], 'B': [1, 2, 4, 7], 'C': [1, 3, 4, 5, 7], 'D': [1, 4, 5, 6], 'E': [3, 4]}

Using Python, I need to extract this:

{1, 2, 3, 4, 5, 6, 7}

Because I need a count of the distinct values for a mathematical equation further downstream.

Here is my current implementation, which works (complete code example):

from itertools import chain

# Create som mock data for testing
dictionary_with_lists = {'A': [2, 3, 5, 6],
                         'B': [1, 2, 4, 7],
                         'C': [1, 3, 4, 5, 7],
                         'D': [1, 4, 5, 6],
                         'E': [3, 4]}

print(dictionary_with_lists)

# Output: 'A': [2, 3, 5, 6], 'B': [1, 2, 4, 7], 'C': [1, 3, 4, 5, 7], 'D': [1, 4, 5, 6], 'E': [3, 4]}

# Flatten dictionary to list of lists, discarding the keys
list_of_lists = [dictionary_with_lists[i] for i in dictionary_with_lists]
print(f'list_of_lists: {list_of_lists}')

# Output: list_of_lists: [[2, 3, 5, 6], [1, 2, 4, 7], [1, 3, 4, 5, 7], [1, 4, 5, 6], [3, 4]]

# Use itertools to flatten the list
flat_list = list(chain.from_iterable(list_of_lists))
print(f'flat_list: {flat_list}')

# Output: flat_list: [2, 3, 5, 6, 1, 2, 4, 7, 1, 3, 4, 5, 7, 1, 4, 5, 6, 3, 4]

# Convert list to set to get only unique values
set_of_unique_items = set(flat_list)
print(f'set_of_unique_items: {set_of_unique_items}')

# Output: set_of_unique_items: {1, 2, 3, 4, 5, 6, 7}

While this works, but I suspect there might be a simpler and more efficient approach.

What would be a more efficient implementation which does not diminish code readability?

My real-world dictionary contains hundreds of thousands or millions of lists of arbitrary lengths.


Solution

  • Try this

    from itertools import chain
    
    d = {'A': [2, 3, 5, 6], 'B': [1, 2, 4, 7], 'C': [1, 3, 4, 5, 7], 'D': [1, 4, 5, 6], 'E': [3, 4]}
    print(set(chain.from_iterable(d.values())))
    

    Output:

    {1, 2, 3, 4, 5, 6, 7}