I have a dictionary created and saved as a text file. I open it as
with open(pathDoc+'/WordsDictionary.txt', 'r+', encoding="utf8") as inf:
wordsDictionary = eval(inf.read())
saved format is this: {'word1':'tag1', 'word2':'tag2'}
when a sentence is given, i want to remove words that belong to a certain tag set. (simply what is done in stop words removal in nltk
, but this is for a language that is not supported by nltk toolkit). example is given below.
wordsDictionary = {'word1':'tag1', 'word2':'tag2', 'word3':'tag3'}
Sentence = "word1 word2 word3 word2 word1"
# I want to remove words that belong to 'tag2' type
FinalSentence = "word1 word3 word1"
How can i generate FinalSentence
?
Thanks!
@haifzhan's solution will get you there for use cases of one word per tag. If you however need more than one word per tag, here's another solution:
sentence = "word1 word2 word3 word2 word1 word4 word5 word1"
tags = {'tag1': ['word1'], 'tag2': ['word4', 'word2'], 'tag3': ['word3']} # Set a dictionary of lists based on tags
final_sentence = ' '.join([word for word in sentence.split() if word not in tags.get('tag2')])
# Output:
final_sentence
'word1 word3 word1 word5 word1'
If your words are not delimited by space though you'll need to approach this a different way, maybe like this:
for word in tags.get('tag2'):
sentence = sentence.replace(word,'')