Search code examples
python-2.7countfile-handlingvsm

How can I find frequency of a specific word from document in python?


I want to find out the frequency of a specific word from a text file. Suppose in my document i have a line "this is me is is " if i input 'is' the output should 3 if my input is 'me' output should 1. i am trying this code

    import re
    doc1 = re.findall(r'\w+', open('E:\doc1.txt').read().lower())
    words = raw_input("Input Number :: ")
    docmtfrequency1 =  words.count(words)

but it is not giving desired output


Solution

  • collections.Counter() has this covered if I understand your problem. The example from the docs would seem to match your problem.

    # Tally occurrences of words in a list
    cnt = Counter()
    for word in ['red', 'blue', 'red', 'green', 'blue', 'blue']:
        cnt[word] += 1
    print cnt
    
    
    # Find the ten most common words in Hamlet
    import re
    words = re.findall('\w+', open('hamlet.txt').read().lower())
    Counter(words).most_common(10)
    

    From the example above you should be able to do:

    import re
    import collections
    words = re.findall('\w+', open('1976.03.txt').read().lower())
    print collections.Counter(words)
    

    naive approach to show one way.

    wanted = "fish chips steak"
    cnt = Counter()
    words = re.findall('\w+', open('1976.03.txt').read().lower())
    for word in words:
        if word in wanted:
            cnt[word] += 1
    print cnt