I need a defaultdict that can do get the finaldict
given a list of query words from the first file.
The final dict is a dictionary of a pair of words from both files that shares the same ID. e.g. foo, oof
shares the same 1243
and 1453
ID. It is to facilitate word-pair search later, when i try to search ('foo','oof')
, it will return ['1243','1453']
.
If i search the finaldict for ('foo','duh')
, it will return nothing as the wordpair don't share any same ID.
query = ['foo','barbar']
finaldict = defaultdict(list)
finaldict = {('foo','oof'):['1243','1453']
('foo','rabrab'):['2323']
('barbar','duh'):['6452']}
I've been doing it as below but is there a simpler way of achieving the finaldict
?
query = ['foo','barbar']
from collections import defaultdict
dict1 = defaultdict(list)
dict2 = defaultdict(list)
dict1['foo'] = ['1234','1453','2323'];
dict1['bar'] =['5230']; dict1['barbar'] =['6452']
dict2['1243']=['oof']
dict2['1453']=['oof']
dict2['4239']=['rba']
dict2['2323']=['rabrab']
dict2['6452']=['duh']
tt = defaultdict(defaultdict)
for p in sorted(query):
for ss in sorted(dict1[p]):
if len(dict2[ss]) != 0 and dict2[ss] != None:
tt[p][ss] = dict2[ss]
finaldict = defaultdict(set)
for src in tt:
for ss in tt[src]:
for trg in tt[src][ss]:
finaldict[(src, trg)].add(ss)
print finaldict[('foo','oof')]
The above code outputs:
>>> print finaldict[('foo','oof')]
set(['1453'])
>>> for i in finaldict:
... print i, finaldict[i]
...
('foo', 'rabrab') set(['2323'])
('barbar', 'duh') set(['6452'])
('foo', 'oof') set(['1453'])
{(k1,v):k2 for k1 in dict1 for k2 in dict2
for v in dict2[k2] if k2 in dict1[k1]}
{('barbar', 'duh'): '6452', ('foo', 'oof'): '1453', ('foo', 'rabrab'): '2323'}