Search code examples
pythonnlpstop-wordsspacy

Add/remove custom stop words with spacy


What is the best way to add/remove stop words with spacy? I am using token.is_stop function and would like to make some custom changes to the set. I was looking at the documentation but could not find anything regarding of stop words. Thanks!


Solution

  • You can edit them before processing your text like this (see this post):

    >>> import spacy
    >>> nlp = spacy.load("en")
    >>> nlp.vocab["the"].is_stop = False
    >>> nlp.vocab["definitelynotastopword"].is_stop = True
    >>> sentence = nlp("the word is definitelynotastopword")
    >>> sentence[0].is_stop
    False
    >>> sentence[3].is_stop
    True
    

    Note: This seems to work <=v1.8. For newer versions, see other answers.