Search code examples
pythonmatplotlibnltkvisualization

Create a matplotlib table/graph from NLTK ngrams


Ive used the ngrams feature in NLTK to create bigrams for a set of product reviews. Having cleaned the data and tokenised the text etc., using the following code:

myDataNeg = df3[df3['sentiment_cat']=='Negative']

# Tokenise each review
myTokensNeg = [word_tokenize(Reviews) for Reviews in myDataNeg['clean_review']]

# Remove stopwords and lowercase all
# Note that len(review)>1 will make sure at least two words are in a review. 
myTokensNeg_noSW_noCase = [[word.lower() for word in Reviews if (len(Reviews)>1) and 
                            (word.lower() not in en_stopwords) and 
                            (len(word)>3)] for Reviews in myTokensNeg]

# Generate lists of bigrams
myBigramNeg = [list(bigrams(Reviews)) for Reviews in myTokensNeg_noSW_noCase]
#myBigramNeg = [list(ngrams(Reviews,n)) for Reviews in myTokensNeg_noSW_noCase]

# Put all lists together
myBigramListNeg = list(itertools.chain.from_iterable(myBigramNeg))


# Get the most frequent ones
bigramFreqNeg = FreqDist(myBigramListNeg)
negbigram = bigramFreqNeg.most_common(5)
negbigram

my output shows the most common pairs of words and their frequencies as such:

[(('stopped', 'working'), 637),
 (('battery', 'life'), 408),
 (('waste', 'money'), 354),
 (('samsung', 'galaxy'), 322),
 (('apple', 'store'), 289)]

However i want to be able to visualise this using matplotlib package. How do i produce a simple table or bar chart showing the most freqently occuring bigrams and their frequencies for what I've made? I tried the code below but it just returns an error:

import matplotlib.pyplot as plt

negbigram.plot.barh(color='blue', width=.9, figsize=(12, 8))

OUT:

AttributeError: 'list' object has no attribute 'plot'

Quite new to using Python and any help would be greatly appreciated

Thanks in advance


Solution

  • You need to separate your output to x-axis and y-axis

    more information about plt.brh

    import matplotlib.pyplot as plt
    
    out_ = [
        (('stopped', 'working'), 637),
        (('battery', 'life'), 408),
        (('waste', 'money'), 354),
        (('samsung', 'galaxy'), 322),
        (('apple', 'store'), 289)
    ]
    
    # join the 2 words with '-' in the middle
    wrds = ['-'.join(x) for x, c in out_]
    
    # get the counts
    wdth = [c for x, c in out_]
    
    plt.barh(wrds, wdth, color='blue')
    

    enter image description here