Search code examples
python-3.xnlpstop-wordspos-tagger

How to remove words of a sentence by using a dictionary as reference


I have a dictionary created and saved as a text file. I open it as

with open(pathDoc+'/WordsDictionary.txt', 'r+', encoding="utf8") as inf:
wordsDictionary = eval(inf.read())

saved format is this: {'word1':'tag1', 'word2':'tag2'}

when a sentence is given, i want to remove words that belong to a certain tag set. (simply what is done in stop words removal in nltk, but this is for a language that is not supported by nltk toolkit). example is given below.

 wordsDictionary = {'word1':'tag1', 'word2':'tag2', 'word3':'tag3'}
    Sentence = "word1 word2 word3 word2 word1"
# I want to remove words that belong to 'tag2' type
FinalSentence = "word1 word3 word1"

How can i generate FinalSentence?

Thanks!


Solution

  • @haifzhan's solution will get you there for use cases of one word per tag. If you however need more than one word per tag, here's another solution:

    sentence = "word1 word2 word3 word2 word1 word4 word5 word1"
    tags = {'tag1': ['word1'], 'tag2': ['word4', 'word2'], 'tag3': ['word3']} # Set a dictionary of lists based on tags
    
    final_sentence = ' '.join([word for word in sentence.split() if word not in tags.get('tag2')])
    
    # Output:
    final_sentence
    'word1 word3 word1 word5 word1'
    

    If your words are not delimited by space though you'll need to approach this a different way, maybe like this:

    for word in tags.get('tag2'):
        sentence = sentence.replace(word,'')