Search code examples
pythonhistogram

Histogram based on frequent/common words


I am trying to create a histogram based on frequent/common words, but I only get errors when running the code. I managed to find the 10 most common words, but I can't visualize it in a histogram.

description_list = df['description'].values.tolist()

from collections import Counter
Counter(" ".join(description_list).split()).most_common(10)

#histogram 
plt.bar(x, y)
plt.title("10 most frequent tokens in description")
plt.ylabel("Frequency")
plt.xlabel("Words")
plt.show

Solution

  • It looks like this missed a few things:

    1. The result of Counter(...).most_common(10) was not assigned to x or y
    2. x, y appear to be unbound
    3. plt.show was not invoked, so it either does nothing or prints something like <function show at 0x...>

    Here's a reproducible example that fixes these:

    from collections import Counter
    import matplotlib.pyplot as plt
    import pandas as pd
    
    data = {
        "description": [
            "This is the first example",
            "This is the second example",
            "This is similar to the first two",
            "This exists add more words"
        ]
    }
    df = pd.DataFrame(data)
    
    
    description_list = df['description'].values.tolist()
    
    # Assign the Counter instance `most_common` call to a variable:
    word_frequency = Counter(" ".join(description_list).split()).most_common(10)
    
    # `most_common` returns a list of (word, count) tuples
    words = [word for word, _ in word_frequency]
    counts = [counts for _, counts in word_frequency]
    
    plt.bar(words, counts)
    plt.title("10 most frequent tokens in description")
    plt.ylabel("Frequency")
    plt.xlabel("Words")
    plt.show()
    

    With expected output:

    Expected output of code is a bar chart showing 10 most frequent words. The word 'This' occurs four times, 'exists' occurs a single time.