Search code examples
pythonpython-2.7nltkwordnet

Extract non-content English language words string - python


I am working on Python script in which I want to remove the common english words like "the","an","and","for" and many more from a String. Currently what I have done is I have made a local list of all such words and I just call remove() to remove them from the string. But I want here some pythonish way to achieve this. Have read about nltk and wordnet but totally clueless about that's what I should use and how to use it.

Edit

Well I don't understand why marked as duplicate as my question does not in any way mean that I know about Stop words and now I just want to know how to use it.....the question is about what I can use in my scenario and answer to that was stop words...but when I posted this question I din't know anything about stop words.


Solution

  • I have found that what I was looking for is this:

    from nltk.corpus import stopwords
    my_stop_words = stopwords.words('english')
    

    Now I can remove or replace the words from my list/string where I find the match in my_stop_words which is a list.

    For this to work I had to download the NLTK for python and the using its downloader I downloaded stopwords package.

    It also contains many other packages which can be used in different situations for NLP like words,brown,wordnet etc.