Search code examples
algorithmreduction

Generate pairs having the same attributes from list


Assume you have a list of items, each with a set of attributes.

What is an efficient algorithm for generating all pairs from the list having the same attributes?

For example, given a list:

[('item1', {'a','b'}), ('item2', {'a'}), ('item3', {'c','b'}), ('item4', {'b'})]

We should return the following list of four pairs, out of the total possible six:

('item1', 'item2') # both have attribute 'a'
('item1', 'item3') # both have attribute 'b'
('item1', 'item4') # both have attribute 'b'
('item3', 'item4') # both have attribute 'b'

Now, the trivial approach would be to first generate the list of all possible n(n+1)/2 pairs, and then filter out those without similar attributes, but I suspect this approach is inefficient, especially if the number of pairs is very large.

Any suggestions?


Solution

  • I would suggest a two phase algorithm:

    arr = [('item1', {'a','b'}), ('item2', {'a'}), ('item3', {'c','b'}), ('item4', {'b'})]
    
    # 1. create map with for each attribute the list of items that have it
    mp = {}
    for lst in arr:
        for prop in lst[1]:
            if prop not in mp: mp[prop] = []
            mp[prop].append(lst[0])
    
    # 2. for each attribute: add the pairs of items to the result set
    result = set()
    for prop in mp:
        items = mp[prop]
        # collect all pairs in items list
        for p1 in range(len(items)):
            for p2 in range(p1+1,len(items)):
                result.add((items[p1],items[p2]))
    
    print (result)
    

    Output:

    {('item1', 'item4'), ('item1', 'item2'), ('item3', 'item4'), ('item1', 'item3')}