Search code examples
pythonarraysuser-input

How can I count occurrences of words specified in an array in Python?


I am working on a small program in which the user enters text and I would like to check how many times the given words occur in the given input.

# Read user input
print("Input your code: \n")

user_input = sys.stdin.read()
print(user_input)

For example, the text that I input in a program is:

a=1
b=3
if (a == 1):
    print("A is a number 1")
elif(b == 3):
    print ("B is 3")
else: 
    print("A isn't 1 and B isn't 3")

The words to find out are specified in an array.

wordsToFind = ["if", "elif", "else", "for", "while"]

And basically I would like to print how many "if", "elif" and "else" has occurred in a input.

How can I count occurrences of words like "if", "elif", "else", "for", "while" in a given string by user input?


Solution

  • I think the best option is to use the tokenize built-in module of python:

    # Let's say this is tokens.py
    import sys
    from collections import Counter
    from io import BytesIO
    from tokenize import tokenize
    
    # Get input from stdin
    code_text = sys.stdin.read()
    
    # Tokenize the input as python code
    tokens = tokenize(BytesIO(code_text.encode("utf-8")).readline)
    
    # Filter the ones in wordsToFind
    wordsToFind = ["if", "elif", "else", "for", "while"]
    words = [token.string for token in tokens if token.string in wordsToFind]
    
    # Count the occurrences
    counter = Counter(words)
    
    print(counter)
    

    Test

    Let's say you have a test.py:

    a=1
    b=3
    if (a == 1):
        print("A is a number 1")
    elif(b == 3):
        print ("B is 3")
    else: 
        print("A isn't 1 and B isn't 3")
    

    and then you run:

    cat test.py | python tokens.py
    

    Output:

    Counter({'if': 1, 'elif': 1, 'else': 1})
    

    Advantages

    • Only correct python (syntactically) will be parsed

    • You only will be counting the python keywords (not every if occurrence in the code text, for example, you can have an line like

      a = "if inside str"

      That if should not be counted I think