Search code examples
pythonpython-3.xdictionaryn-gram

Given a text count occurrences of all two consecutive words



Input:

Once upon a time a time this upon a


Output:

dictionary {
    'Once upon': 1,
       'upon a': 2,
       'a time': 2,
       'time a': 1,
    'time this': 1,
    'this upon': 1
}


CODE:

def countTuples(path):
    dic = dict()
    with codecs.open(path, 'r', 'utf-8') as f:
        for line in f:
            s = line.split()
            for i in range (0, len(s)-1):
                dic[str(s[i]) + ' ' + str(s[i+1])] += 1
    return dic

I am getting this error:

File "C:/Users/user/Anaconda3/hw2.py", line 100, in countTuples
    dic[str(s[i]) + ' ' + str(s[i+1])] += 1
TypeError: list indices must be integers or slices, not str

If I remove the += and just place =1 everything works just fine, I guess the problem is when I try to access an entry to extract a value that doesn't exist yet ?

What can I do to fix this ?


Solution

  • You can use a defaultdict to make your solution work. With a defaultdict, you specify the default type of the value of a key-value pair. This allows you to make an assignment like +=1 to a key which has not been explicitly created, yet:

    import codecs
    from collections import defaultdict
    
    def countTuples(path):
        dic = defaultdict(int)
        with codecs.open(path, 'r', 'utf-8') as f:
            for line in f:
                s = line.split()
                for i in range (0, len(s)-1):
                    dic[str(s[i]) + ' ' + str(s[i+1])] += 1
        return dic
    
    >>> {'Once upon': 1,
         'a time': 2,
         'this upon': 1,
         'time a': 1,
         'time this': 1,
         'upon a': 2})