Defaultdict() the correct choice?

EDIT: mistake fixed

The idea is to read text from a file, clean it, and pair consecutive words (not permuations):

file = f.read()
words = [word.strip(string.punctuation).lower() for word in file.split()]
pairs = [(words[i]+" " + words[i+1]).split() for i in range(len(words)-1)]

Then, for each pair, create a list of all the possible individual words that can follow that pair throughout the text. The dict will look like

[ConsecWordPair]:[listOfFollowers]

Thus, referencing the dictionary for a given pair will return all of the words that can follow that pair. E.g.

wordsThatFollow[('she', 'was')]
>> ['alone', 'happy', 'not']

My algorithm to achieve this involves a defaultdict(list)...

wordsThatFollow = defaultdict(list) 

for i in range(len(words)-1):
    try:
        # pairs overlap, want second word of next pair
        # wordsThatFollow[tuple(pairs[i])] = pairs[i+1][1]
        EDIT: wordsThatFollow[tuple(pairs[i])].update(pairs[i+1][1][0]
    except Exception:
        pass

I'm not so worried about the value error I have to circumvent with the 'try-except' (unless I should be). The problem is that the algorithm only successfully returns one of the followers:

wordsThatFollow[('she', 'was')]
>> ['not']

Sorry if this post is bad for the community I'm figuring things out as I go ^^

Solution

Your problem is that you are always overwriting the value, when you really want to extend it:

# Instead of this
wordsThatFollow[tuple(pairs[i])] = pairs[i+1][1]

# Do this
wordsThatFollow[tuple(pairs[i])].append(pairs[i+1][1])