I'm trying to compare two lists based on the index number of each list:
list1 = [
['1', ['a']],
['2', ['b', 'c', 'd']],
['3', ['e']],
['4', ['f', 'g']],
['5', ['h']]
]
list2 = [
['1', ['e']],
['2', ['f', 'c']],
['3', ['h', 'g', 'a', 'd']],
['4', ['b']],
['5', ['b']],
]
What I would like to do is to compare each row of list1 with all the rows in list2 and return the matching values. For instance in this example the desirable outcome would be
1(list1) - 3(list2),
2-2,
2-3,
2-4,
2-5,
3-1,
4-2,
4-3
in total 8. And then delete the similar ones, like: 2-4 and 4-2, 1-3 and 3-1.
You are looking for the set intersections of the product of your 'labels', where each pair is itself a set too (order doesn't matter, if 2-4
and 4-2
are considered the same).
Intersections are most efficiently tested with the Python set
type, so when building those dictionaries lets convert them to sets up front.
So we need the unique labels, and a way to look up the associated list for each label. That's the job for dictionaries, so convert your lists to dictionaries first, and get the union of their keys. Then turn each pairing into a set as well so {'2', '4'}
and {'4', '2'}
are seen as the same, storing the results in another set. Note that 2-2
becomes 2
in this scenario, as a set would store '2'
just once.
Then all we have to do is test if there is an intersection between the two lists associated with the picked combination of keys, and include that combo if there is:
from itertools import product
dict1 = {k: set(l) for k, l in list1}
dict2 = {k: set(l) for k, l in list2}
keys = dict1.keys() | dict2.keys() # all unique keys in both
found = {
frozenset((k1, k2))
for k1, k2 in product(keys, repeat=2)
if dict1.get(k1, set()) & dict2.get(k2, set())
}
Demo:
>>> from itertools import product
>>> dict1 = {k: set(l) for k, l in list1}
>>> dict2 = {k: set(l) for k, l in list2}
>>> keys = dict1.keys() | dict2.keys() # all unique keys in both
>>> {
... frozenset((k1, k2))
... for k1, k2 in product(keys, repeat=2)
... if dict1.get(k1, set()) & dict2.get(k2, set())
... }
{frozenset({'3', '4'}), frozenset({'2'}), frozenset({'3', '5'}), frozenset({'2', '5'}), frozenset({'2', '3'}), frozenset({'2', '4'}), frozenset({'1', '3'})}
If you must have doubled-up references, you can post-process the result:
for combo in found:
try:
a, b = combo
except ValueError: # doesn't contain 2 values, assume 1
a, = b, = combo
print(f'{a}-{b}')
The order will vary dependant on the current random hash seed, so you may want to use sorting. I get this output:
3-4
2-2
3-5
2-5
2-3
2-4
1-3