Search code examples
pythondictionarynlpdefaultdict

How to access a list in a nested defaultdict python 2.7?


I have the code below, which is able to give me the documents frequency for each wordQ1, now I need the term frequency TF of wordQ1 (TF in each document DocID) and the DocSize of each DocID.

The data structure is like that:

FinalHash[wordQ1]={DocID: [TF,DocSize]}

My output should be like the following:

  • The current document is 999
  • The number of tokens contains in this document is 59
  • george document frequency is 142 (I have this done)
  • george term frequency in file 999 is 5

    Term_List1=[]
    DF1 = 0
    Term_List1 = FinalHash[wordQ1] # FinalHash is defaultdict
    for d in Term_List1: # is the list of all dictionaries where each dict contains {DocID: [TF,DocSize]}
        for i in d.keys(): # in this case i is the docID
            DF1 = DF1+1 # counter to get the document frequency of a term
            print i
    
    print "document frequency of wordQ1 is",DF1 # document frequency 
    

    Thanks a lot for your help


Solution

  • You can get at the values in your inner dictionaries by changing how you loop on them. Replace the for i in d.keys() loop with something like this:

    for DocID, (TF, DocSize) in d.items():
        # ...
    

    You haven't actually explained what you want to do with the TF and DocSize values, so I've left the actual contents of the loop up to you.

    Note that needing a loop here is a bit silly. If you are the one creating the data structure you're working on (rather than getting it from some external source), you should probably change its design to be a single dictionary (at this level) rather than a list of dictionaries, each with a single key. That is, you'd get rid of the Term_List1 level of the data structure, and get d directly at that point.