Search code examples
pythonnlpnltkstop-words

Stopword Removal Dilemma


I am facing a dilemma with the stopwords function in NLTK. I am processing user-generated content from a social media platform by removing stopwords using NLTK. However, the dilemma is I want to keep personal pronouns in users' text, which are important for the classification task. These include words such as "I" "you" "we", etc.

Unfortunately, the stopwords function deletes these words, too, and I need them to be present. How can I solve this problem?


Solution

  • import nltk
    from nltk.corpus import stopwords
    stop_words= stopwords.words('english')
    type(stop_words)
    print(len(stop_words))
    

    If you look at the output, type of stop words is List. then :

    personal_pronouns= ['i', 'you', 'she', 'he', 'they'] # you can add another words for remove
    for word in personal_pronouns:
        if word in stop_words:
            stop_words.remove(word)
            print(word+ '  Deleted')
    print(len(stop_words))