I would like to ask you a question. Is there any algorithm/tool which can allow me to do some association between words? For example: I have the following group of sentences:
(1)
"My phone is on the table"
"I cannot find the charger". # no reference on phone
(2)
"My phone is on the table"
"I cannot find the phone's charger".
What I would like to do is to find a connection, probably a semantic connection, which can allow me to say that the first two sentences are talking about a topic (phone) as two terms (phone and charger) are common within it (in general). Same for the second sentence. I should have something that can connect phone to charger, in the first sentence. I was thinking of using Word2vec, but I am not sure if this is something that I can do with it. Do you have any suggestions about algorithms that I can use to determine similarity of topics (i.e. sentence which are formulated in a different way, but having same topic)?
In Python I'm pretty sure you have a Sequence Matcher that you can usee
from difflib import SequenceMatcher
def similar(a, b):
return SequenceMatcher(None, a, b).ratio()
If you want your own algorithm I would suggest a Levenstains Distance (it calculates how many operations you need to turn one string(sentance) into another. Might be usefull.). I coded it myself in like this for two strings
edits = [[x for x in range(len(str1) + 1)] for y in range(len(str2)+ 1)]
for i in range(len(str2) + 1):
edits[i][0] = i
for i in range(1, len(str2) + 1):
for j in range(1, len(str1) + 1):
if str2[i-1] == str1[j-1]:
edits[i][j] = edits[i-1][j-1]
else:
edits[i][j] = 1 + min(edits[i-1][j-1], edits[i-1][j],
edits[i][j-1])
return edits[-1][-1]
[EDIT] For you, you want to compare if the sentances are about the similar topic. I would suggest any of the following algorithms (all are pretty easy)