Search code examples
python-3.xword-frequency

How to extract the 10 most frequent words in a text in form of a list of words in Python?


I have a text and am trying to extract the 10 most frequent words in it. I use the text.most_common(10) method, but am getting the ouput in form of a tuple which also contains the number of occurencies (which I don't need...). How can I fix this so that the output is just the words in form of a list?

Note: I can't use the nltk library in the program to be created.

this is the code I wrote:

tuple(map(str, Counter(text).most_common(10)))

this is the output I am getting:

('science', 55)

this is the output I need:

["science"]

Solution

  • You need to get the first item in the pairs returned by Counter.most_common().

    [t[0] for t in counter.most_common(10)]
    

    Full demo:

    from collections import Counter
    
    text = """\
    A Counter is a dict subclass for counting hashable objects. It is a collection
    where elements are stored as dictionary keys and their counts are stored as
    dictionary values. Counts are allowed to be any integer value including zero or
    negative counts. The Counter class is similar to bags or multisets in other
    languages.
    
    Elements are counted from an iterable or initialized from another mapping (or
    counter):
    
    Counter objects have a dictionary interface except that they return a zero
    count for missing items instead of raising a KeyError:
    
    Setting a count to zero does not remove an element from a counter. Use del to
    remove it entirely:
    
    New in version 3.1.
    
    Changed in version 3.7: As a dict subclass, Counter inherited the capability to
    remember insertion order. Math operations on Counter objects also preserve
    order. Results are ordered according to when an element is first encountered in
    the left operand and then by the order encountered in the right operand.
    """
    
    counter = Counter(text.split())
    
    [t[0] for t in counter.most_common(10)]
    

    gives

    ['a', 'to', 'Counter', 'are', 'in', 'is', 'the', 'dictionary', 'zero', 'or']