Search code examples
pythonpandasdataframepandas-groupbypython-collections

Summing up collections.Counter objects using `groupby` in pandas


I am trying to group the words_count column by both essay_Set and domain1_score and adding the counters in words_count to add the counters results as mentioned here:

>>> c = Counter(a=3, b=1)
>>> d = Counter(a=1, b=2)
>>> c + d                       # add two counters together:  c[x] + d[x]
Counter({'a': 4, 'b': 3})

I grouped them using this command: words_freq_by_set = words_freq_by_set.groupby(by=["essay_set", "domain1_score"]) but do not know how to pass the Counter addition function to apply it on words_count column which is simply +. Here is my dataframe:

enter image description here


Solution

  • GroupBy.sum works with Counter objects. However I should mention the process is pairwise, so this may not be very fast. Let's try

    words_freq_by_set.groupby(by=["essay_set", "domain1_score"])['words_count'].sum()
    

    df = pd.DataFrame({
        'a': [1, 1, 2], 
        'b': [Counter([1, 2]), Counter([1, 3]), Counter([2, 3])]
    })
    df
    
       a             b
    0  1  {1: 1, 2: 1}
    1  1  {1: 1, 3: 1}
    2  2  {2: 1, 3: 1}
    
    
    df.groupby(by=['a'])['b'].sum()
    
    a
    1    {1: 2, 2: 1, 3: 1}
    2          {2: 1, 3: 1}
    Name: b, dtype: object