Search code examples
pythonnesteddefaultdicttrigram

How do I nest a current dictionary into another one in python?


I have a default dict that has 3 layers of embedding that is to be used later for a trigram.

counts = defaultdict(lambda:defaultdict(lambda:defaultdict(lambda:0)))

Then, I have a for loop that goes through a document and creates counts of each letter (and bicounts and tricounts)

counts[letter1][letter2][letter3] = counts[letter1][letter2][letter3] + 1

I want to add another layer so that I can specify if the letter is a consonant or a vowel.

I want to be able to run my bigram and trigram over Consonant vs. Vowel instead of over every letter of the alphabet, but I do not know how to do this.


Solution

  • I'm not sure exactly what you want to do, but I think the nested dict approach is not as clean as having a flat dict where you key by the combined string of characters (i.e. d['ab'] instead of d['a']['b']). I also put in code to check if the bigram/trigram is composed only of vowels/consonants or a mixture.

    CODE:

    from collections import defaultdict
    
    
    def all_ngrams(text,n):
        ngrams = [text[ind:ind+n] for ind in range(len(text)-(n-1))]
        ngrams = [ngram for ngram in ngrams if ' ' not in ngram]
        return ngrams
    
    
    counts = defaultdict(int)
    text = 'hi hello hi this is hii hello'
    vowels = 'aeiouyAEIOUY'
    consonants = 'bcdfghjklmnpqrstvwxzBCDFGHJKLMNPQRSTVWXZ'
    
    for n in [2,3]:
        for ngram in all_ngrams(text,n):
            if all([let in vowels for let in ngram]):
                print(ngram+' is all vowels')
    
            elif all([let in consonants for let in ngram]):
                print(ngram+' is all consonants')
    
            else:
                print(ngram+' is a mixture of vowels/consonants')
    
            counts[ngram] += 1
    
    print(counts)
    

    OUTPUT:

    hi is a mixture of vowels/consonants
    he is a mixture of vowels/consonants
    el is a mixture of vowels/consonants
    ll is all consonants
    lo is a mixture of vowels/consonants
    hi is a mixture of vowels/consonants
    th is all consonants
    hi is a mixture of vowels/consonants
    is is a mixture of vowels/consonants
    is is a mixture of vowels/consonants
    hi is a mixture of vowels/consonants
    ii is all vowels
    he is a mixture of vowels/consonants
    el is a mixture of vowels/consonants
    ll is all consonants
    lo is a mixture of vowels/consonants
    hel is a mixture of vowels/consonants
    ell is a mixture of vowels/consonants
    llo is a mixture of vowels/consonants
    thi is a mixture of vowels/consonants
    his is a mixture of vowels/consonants
    hii is a mixture of vowels/consonants
    hel is a mixture of vowels/consonants
    ell is a mixture of vowels/consonants
    llo is a mixture of vowels/consonants
    defaultdict(<type 'int'>, {'el': 2, 'his': 1, 'thi': 1, 'ell': 2, 'lo': 2, 'll': 2, 'ii': 1, 'hi': 4, 'llo': 2, 'th': 1, 'hel': 2, 'hii': 1, 'is': 2, 'he': 2})