Search code examples
pythondataframecounter

Using Counter() function in python


I have an excel file with many rows and columns. I want to do the following. First, I want to filter the rows based on a text match. Second, I want to choose a particular column and generate word frequency for ALL THE WORDS in that column. Third, I want to graph the word and frequency.

I have figured out the first part. My question is how to apply Counter() on a dataframe. If I just use Counter(df), it returns an error. So, I used the following code to convert each row into a list and then applied Counter. When I do this, I get the word frequency for each row separately (if I use counter within the for loop, else I get the word frequency for just one row). However, I want a word count for all the rows put together. Appreciate any inputs. Thanks! The following is an example data.

product      review
a            Great Product
a            Delivery was fast 
a            Product received in good condition
a            Fast delivery but useless product
b            Dont recommend
b            I love it
b            Please dont buy
b            Second purchase

My desired output is like this: for product a - (product,3),(delivery,2)(fast,2) etc..my current output is like (great,1), (product,1) for the first row.

This is the code I used.

strdata = column.values.tolist() 
tokens = [tokenizer.tokenize(str(i)) for i in strdata] 

cleaned_list = []
for m in tokens:
    stopped = [i for i in m if str(i).lower() not in stop_words] 
    stemmed = [stemmer.stem(i) for i in stopped] 
    cleaned_list.append(stopped) #append stemmed words to list
    count = Counter(stemmed)
    print(count.most_common(10))


Solution

  • Firstly, using groupby concatenate strings from same group.

    Secondly, apply Counter() on joined strings.

    joined = df.groupby('product', as_index=False).agg({'review' : ' '.join})
    joined['count'] = joined.apply(lambda x: collections.Counter(x['review'].split(' ')), axis=1)
    
    # print(joined)
    
      product                                             review                                              count
    0       a  Great Product Delivery was fast Product receiv...  {'Great': 1, 'Product': 2, 'Delivery': 1, 'was...
    1       b  Dont recommend I love it Please dont buy Secon...  {'Dont': 1, 'recommend': 1, 'I': 1, 'love': 1,...