I'm currently working on a small piece of code and I seem to have run into a roadblock. I was wondering if it's possible to find the most common occurrence of a character that follows a specific group of characters?
For example, say I have the following sentence:
"fishies are super neat, a fish is a good pet. also, fishing is for dads."
How would could I determine, for example, the most common character that occurs after the fragment "fish"
?
In this specific example, doing it by hand, I get something like this:
{"i": 2, " ": 1}
Currently, I have this chunk of code written to grab the "fish" portion of the word:
b = Class(n, 'file.txt')
ngrams = [b.file[i:i+n] for i in range(len(b.file)-1)]
this will break up all of the text into chunks of 4 like so: ['fish', 'ishi', 'shie', 'hies', 'ies ', 'es a'.....]
My goal is to combine these two thoughts so that I can print something that looks like the following:
{'fish' : {'i':2, ' ':1} ..... }
I also currently have a defaultdict defined in __init__
like so: self.counts=defaultdict(lambda: defaultdict(int))
This is the closest I can get to achieving my desired solution, although I am unsure how to grab the individual characters that follow and how to count those characters:
b.counts = {i : { j : 5 for j in ngrams } for i in ngrams }
5
is merely a placeholder so I could see what printed. j in ngrams
was also a placeholder to see what printed. Any input or ideas from anyone would be greatly appreciated!
import re
raw_string = "fishies are super neat, a fish is a good pet. also, fishing is for dads."
key = ['fish', 'ishi', 'shie', 'hies']
out = {}
for item in key:
data = []
for word in re.findall(item+'.', raw_string):
data.append(word[-1])
results = {item:data.count(item) for item in data}
out[item] = results
Output:
{'fish': {'i': 2, ' ': 1}, 'ishi': {'e': 1, 'n': 1}, 'shie': {'s': 1}, 'hies': {' ': 1}}