Search code examples
pythoncloudnltk

How to count number of times a text appeared in multiple cells using python


I want to figure out a way by which I could count number of times a similar word have appeared in multiple rows. For example, Street have appeared, Carla have appeared twice. (* Note --> There are many such rows wherein I am not sure which word is common)

Description
Street 29 euro
Street 31 USD
Carla xyz 45 output
Street 345 tmd
Carla asb 6789 tim

Please help


Solution

  • Not sure what format your data is in but lets assume its a pandas DataFrame.

    First convert to a list:

    rows = df["Description"]
    

    Create a large list as a container for all words:

    large_list = []
    

    Iterate over the rows, split each row by whitespaces and append the list of words in this row to the large list:

    for row in rows:
        large_list += row.split()
    

    Count how often each element (word) in the list occurs:

    import collections
    counts = collections.Counter(large_list)
    print(counts)
    

    You might want to add filters such as a word can only contain of letters (and not e.g. numbers), stopword filtering etc..