I have a dictionary generated using defaultdict
:
{"GGGAAATTTCCCTTTGGGAAACGG": ["9/1", "9/2", "1/1.1", "9/2.1"],
"GGGAAATTTCCCTTTGGGAAAGCC": ["9/2", "9/2.1"],
"GGGAAATTTCCCTTTGGGAAAGGG": ["1/1", "1/2", "9/1", "1/1.1"]}
One of the enteries is a subset of the other in terms of its values:
"GGGAAATTTCCCTTTGGGAAAGCC": ["9/2", "9/2.1"]
is a subset of
"GGGAAATTTCCCTTTGGGAAACGG": ["9/1", "9/2", "1/1.1", "9/2.1"]
How would I go about collapsing the dictionary so that in the end I would get either of these results?
{"GGGAAATTTCCCTTTGGGAAACGG": ["9/1", "9/2", "1/1.1", "9/2.1"],
"GGGAAATTTCCCTTTGGGAAAGGG": ["1/1", "1/2", "9/1", "1/1.1"]}
or
{["GGGAAATTTCCCTTTGGGAAACGG", "GGGAAATTTCCCTTTGGGAAAGCC"]:
["9/1", "9/2", "1/1.1", "9/2.1"],
"GGGAAATTTCCCTTTGGGAAAGGG":
["1/1", "1/2", "9/1", "1/1.1"]}
Edit:
So as requested this was my attempt:
#dd is my defaultdict
for keys, values in dd.iteritems():
if all(for item in values:
if item in dd.items():
return True
else:
return False):
print keys
You can try this
mydict = {"GGGAAATTTCCCTTTGGGAAACGG": ["9/1", "9/2", "1/1.1", "9/2.1"],
"GGGAAATTTCCCTTTGGGAAAGCC": ["9/2", "9/2.1"],
"GGGAAATTTCCCTTTGGGAAAGGG": ["1/1", "1/2", "9/1", "1/1.1"]}
>>>dict([i for i in mydict.items() if not any(set(j).issuperset(set(i[1])) and j!=i[1] for j in mydict.values())])
{'GGGAAATTTCCCTTTGGGAAACGG': ['9/1', '9/2', '1/1.1', '9/2.1'],
'GGGAAATTTCCCTTTGGGAAAGGG': ['1/1', '1/2', '9/1', '1/1.1']}
OR simply
for i in mydict.items():
for j in mydict.values():
if i[1]!=j:
if set(j).issuperset(set(i[1])):
mydict.pop(i[0])
>>>mydict
{'GGGAAATTTCCCTTTGGGAAACGG': ['9/1', '9/2', '1/1.1', '9/2.1'],
'GGGAAATTTCCCTTTGGGAAAGGG': ['1/1', '1/2', '9/1', '1/1.1']}