I have the code below, which is able to give me the documents frequency for each wordQ1
, now I need the term frequency TF
of wordQ1
(TF
in each document DocID
) and the DocSize
of each DocID
.
The data structure is like that:
FinalHash[wordQ1]={DocID: [TF,DocSize]}
My output should be like the following:
george term frequency in file 999 is 5
Term_List1=[]
DF1 = 0
Term_List1 = FinalHash[wordQ1] # FinalHash is defaultdict
for d in Term_List1: # is the list of all dictionaries where each dict contains {DocID: [TF,DocSize]}
for i in d.keys(): # in this case i is the docID
DF1 = DF1+1 # counter to get the document frequency of a term
print i
print "document frequency of wordQ1 is",DF1 # document frequency
Thanks a lot for your help
You can get at the values in your inner dictionaries by changing how you loop on them. Replace the for i in d.keys()
loop with something like this:
for DocID, (TF, DocSize) in d.items():
# ...
You haven't actually explained what you want to do with the TF
and DocSize
values, so I've left the actual contents of the loop up to you.
Note that needing a loop here is a bit silly. If you are the one creating the data structure you're working on (rather than getting it from some external source), you should probably change its design to be a single dictionary (at this level) rather than a list of dictionaries, each with a single key. That is, you'd get rid of the Term_List1
level of the data structure, and get d
directly at that point.