I'm trying to use a shortest path function to find the distance between strings in a graph. The problem is that sometimes there are close matches that I want to count. For example, I would like "communication" to count as "communications" or "networking device" to count as "network device". Is there a way to do this in python? (e.g., extract the root of words, or compute a string distance, or perhaps a python library that already have word-form relationships like plural/gerund/misspelled/etc) My problem right now is that my process only works when there is an exact match for every item in my database, which is difficult to keep clean.
For example:
List_of_tags_in_graph = ['A', 'list', 'of', 'tags', 'in', 'graph']
given_tag = 'lists'
if min_fuzzy_string_distance_measure(given_tag, List_of_tags_in_graph) < threshold :
index_of_min = index_of_min_fuzzy_match(given_tag, List_of_tags_in_graph)
given_tag = List_of_tags_in_graph[index_of_min]
#... then use given_tag in the graph calculation because now I know it matches ...
Any thought on easy or quick way to do this? Or, perhaps a different way to think about accepting close-match strongs ... or perhaps just better error handling when strings don't match?
Try using nltk WorldNetLemmatizer, it is designed to extract root of words. https://www.nltk.org/_modules/nltk/stem/wordnet.html