Search code examples
pythonpandascollectionscounter

Applying the counter from collection to a column in a dataframe


I have a column of strings, where each row is a list of strings. I want to count the elements of the column in its entirety and not just the rows which one gets with the value.counts() in pandas. I want to apply the Counter() from the Collections module, but that runs only on a list. My column in the DataFrame looks like this:

[['FollowFriday', 'Awesome'],
 ['Covid_19', 'corona', 'Notagain'],
 ['Awesome'],
 ['FollowFriday', 'Awesome'],
 [],
 ['corona', Notagain],
....]

I want to get the counts, such as

[('FollowFriday', 2),
 ('Awesome', 3),
 ('Corona', 2),
 ('Covid19'),
 ('Notagain', 2),
 .....]

The basic command that I am using is:

from collection import Counter
Counter(df['column'])

OR

from collections import Counter
Counter(" ".join(df['column']).split()).most_common() 

Any help would be greatly appreciated!


Solution

  • IIUC, your comparison to pandas was only to explain your goal and you want to work with lists?

    You can use:

    l = [['FollowFriday', 'Awesome'],
         ['Covid_19', 'corona', 'Notagain'],
         ['Awesome'],
         ['FollowFriday', 'Awesome'],
         [],
         ['corona', 'Notagain'],
        ]
    
    from collections import Counter
    from itertools import chain
    
    out = Counter(chain.from_iterable(l))
    

    or if you have a Series of lists, use explode:

    out = Counter(df['column'].explode())
    # OR
    out = df['column'].explode().value_counts()
    

    output:

    Counter({'FollowFriday': 2,
             'Awesome': 3,
             'Covid_19': 1,
             'corona': 2,
             'Notagain': 2})