Search code examples
python-3.xnlpstanford-nlpopennlp

Remove stopwords list from list in Python (Natural Language Processing)


I have been trying to remove stopwords using python 3 code but my code does not seem to work,I want to know how to remove stop words from the below list. The example structure is as below:

    from nltk.corpus import stopwords

    word_split1=[['amazon','brand','- 
    ','solimo','premium','almonds',',','250g','by','solimo'],
    ['hersheys','cocoa', 'powder', ',', '225g', 'by', 'hersheys'], 
    ['jbl','t450bt','extra','bass','wireless','on- 
    ear','headphones','with','mic','white','by','jbl','and']]

I am trying to remove stop words and tried the below is my code and i would appreciate if anyone can help me rectify the issue.. here is the code below

    stop_words = set(stopwords.words('english'))

    filtered_words=[]
    for i in word_split1:
        if i not in stop_words:
            filtered_words.append(i)

I get error:

    Traceback (most recent call last):
    File "<ipython-input-451-747407cf6734>", line 3, in <module>
    if i not in stop_words:
    TypeError: unhashable type: 'list'

Solution

  • You have a list of lists.

    Try:

    word_split1=[['amazon','brand','- ','solimo','premium','almonds',',','250g','by','solimo'],['hersheys','cocoa', 'powder', ',', '225g', 'by', 'hersheys'],['jbl','t450bt','extra','bass','wireless','on-ear','headphones','with','mic','white','by','jbl','and']]
    stop_words = set(stopwords.words('english'))
    filtered_words=[]
    for i in word_split1:
        for j in i:
            if j not in stop_words:
                filtered_words.append(j)
    

    or flatten your list.

    Ex:

    from itertools import chain    
    
    word_split1=[['amazon','brand','- ','solimo','premium','almonds',',','250g','by','solimo'],['hersheys','cocoa', 'powder', ',', '225g', 'by', 'hersheys'],['jbl','t450bt','extra','bass','wireless','on-ear','headphones','with','mic','white','by','jbl','and']]
    stop_words = set(stopwords.words('english'))
    filtered_words=[]
    for i in chain.from_iterable(word_split1):
        if i not in stop_words:
            filtered_words.append(i)
    

    or

    filtered_words = [i for i in chain.from_iterable(word_split1) if i not in stop_words]