On the same lines as the solution provided in this link, I am trying to get all leaf words of one stem word. I am using the community-contributed (@Divyanshu Srivastava) package get_word_forms
Imagine I have a shorter sample word list as follows:
my_list = [' jail', ' belief',' board',' target', ' challenge', ' command']
If I work it manually, I do the following (which is go word-by-word, which is very time-consuming if I have a list of 200 words):
get_word_forms("command")
and get the following output:
{'n': {'command',
'commandant',
'commandants',
'commander',
'commanders',
'commandership',
'commanderships',
'commandment',
'commandments',
'commands'},
'a': set(),
'v': {'command', 'commanded', 'commanding', 'commands'},
'r': set()}
'n' is noun, 'a' is adjective, 'v' is verb, and 'r' is adverb.
If I try to reverse-stem the entire list in one go:
[get_word_forms(word) for word in sample]
I fail at getting any output:
[{'n': set(), 'a': set(), 'v': set(), 'r': set()},
{'n': set(), 'a': set(), 'v': set(), 'r': set()},
{'n': set(), 'a': set(), 'v': set(), 'r': set()},
{'n': set(), 'a': set(), 'v': set(), 'r': set()},
{'n': set(), 'a': set(), 'v': set(), 'r': set()},
{'n': set(), 'a': set(), 'v': set(), 'r': set()},
{'n': set(), 'a': set(), 'v': set(), 'r': set()}]
I think I am failing at saving the output to the dictionary. Eventually, I would like my output to be a list without breaking it down into noun, adjective, adverb, or verb:
something like:
['command','commandant','commandants', 'commander', 'commanders', 'commandership',
'commanderships','commandment', 'commandments', 'commands','commanded', 'commanding', 'commands', 'jail', 'jailer', 'jailers', 'jailor', 'jailors', 'jails', 'jailed', 'jailing'.....] .. and so on.
One solution using nested list comprehensions after stripping forgotten spaces:
all_words = [setx for word in my_list for setx in get_word_forms(word.strip()).values() if len(setx)]
# Flatten the list of sets
all_words = [word for setx in all_words for word in setx]
# Remove the repetitions and sort the set
all_words = sorted(set(all_words))
print(all_words)
['belief', 'beliefs', 'believabilities', 'believability', 'believable', 'believably', 'believe', 'believed', 'believer', 'believers', 'believes', 'believing', 'board', 'boarded', 'boarder', 'boarders', 'boarding', 'boards', 'challenge', 'challengeable', 'challenged', 'challenger', 'challengers', 'challenges', 'challenging', 'command', 'commandant', 'commandants', 'commanded', 'commander', 'commanders', 'commandership', 'commanderships', 'commanding', 'commandment', 'commandments', 'commands', 'jail', 'jailed', 'jailer', 'jailers', 'jailing', 'jailor', 'jailors', 'jails', 'target', 'targeted', 'targeting', 'targets']