I want to figure out a way by which I could count number of times a similar word have appeared in multiple rows. For example, Street
have appeared, Carla
have appeared twice. (* Note --> There are many such rows wherein I am not sure which word is common)
Description |
---|
Street 29 euro |
Street 31 USD |
Carla xyz 45 output |
Street 345 tmd |
Carla asb 6789 tim |
Please help
Not sure what format your data is in but lets assume its a pandas DataFrame.
First convert to a list:
rows = df["Description"]
Create a large list as a container for all words:
large_list = []
Iterate over the rows, split each row by whitespaces and append the list of words in this row to the large list:
for row in rows:
large_list += row.split()
Count how often each element (word) in the list occurs:
import collections
counts = collections.Counter(large_list)
print(counts)
You might want to add filters such as a word can only contain of letters (and not e.g. numbers), stopword filtering etc..