Hello I have a specific string and I am trying to calculate its distance using edit distance and I want to see the number of counts of the string that occurs and then sort it.
str= "Hello"
and a txt file named- xfile I am comparing with is:
"hola"
"how are you"
"what is up"
"everything good?"
"hola"
"everything good?"
"what is up?"
"okay"
"not cool"
"not cool"
I want to make a dictionary that compares all the lines with the xfile and give it's edit distance and count. For now, I am able to get it's key and distance, but not it's count. Can someone please suggest me it?
My code is:
data= "Hello"
Utterences = {}
for lines in readFile:
dist= editdistance.eval(data,lines)
Utterances[lines]= dist
For every utterance you can have a dictionary containing the distance and count:
import editdistance
data = 'Hello'
utterances = {}
xlist = [
'hola',
'how are you',
'what is up',
'everything good?',
'hola',
'everything good?',
'what is up?',
'okay',
'not cool',
'not cool',
]
for line in xlist:
if line not in utterances:
utterances[line] = {
'distance': editdistance.eval(data, line),
'count': 1
}
else:
utterances[line]['count'] += 1
Then if you need the utterances sorted by distance or count you can use an OrderedDict:
from collections import OrderedDict
sorted_by_distance = OrderedDict(sorted(utterances.items(), key=lambda t: t[1]['distance']))
sorted_by_count = OrderedDict(sorted(utterances.items(), key=lambda t: t[1]['count']))