Search code examples
pythondataframepandas-groupbypandas-apply

Count values using groupby function and using apply function at the same time


I'm trying to count the occurance of grouped values and write values in a column using apply and grouby function on a dataframe. I have the following data frame:

df = pd.DataFrame({'colA': ['name1', 'name2', 'name2', 'name4', 'name2', 'name5', 'name5'], 'colB': ['red', 'yellow', 'yellow', 'black', 'yellow', 'green', 'blue']})

I have two statements which gives me the correct result, but I need the result in combination. One is:

df_new = df.groupby("colA").count()

which gives

colA
name1    1
name2    3
name4    1
name5    2

the other one is

df_new = df.groupby("colA")["colB"].apply(lambda lists: ','.join(color)).reset_index(name='Color')

and gives

    colA                Color
0  name1                   red
1  name2  yellow,yellow,yellow
2  name4                 black
3  name5            green,blue

what I need is the combination looking like

    colA                Color      Count grouped A
0  name1                   red     1
1  name2  yellow,yellow,yellow     3
2  name4                 black     1
3  name5            green,blue     2

Trying to combinate in many ways and of course did research but I couldn't get it right.


Solution

  • You can join first to second as new column and use colA to assign values in correct places.

    df_new = df_2.join(df_1, on='colA')
    

    It needed also df_1.rename(columns={'colB': 'Count grouped A'})


    import pandas as pd
    
    df = pd.DataFrame({'colA': ['name1', 'name2', 'name2', 'name4', 'name2', 'name5', 'name5'], 'colB': ['red', 'yellow', 'yellow', 'black', 'yellow', 'green', 'blue']})
    
    df_1 = df.groupby("colA").count().rename(columns={'colB': 'Count grouped A'})
    
    df_2 = df.groupby("colA")["colB"].apply(lambda lists: ','.join(lists)).reset_index(name='Color')
    
    df_new = df_2.join(df_1, on='colA')
    
    print(df_new)
    

    EDIT:

    The same with small changes

    • first groups = df.groupby("colA") and later two times groups...

    • .apply(','.join) instead of .apply(lambda lists: ','.join(lists))

    import pandas as pd
    
    df = pd.DataFrame({'colA': ['name1', 'name2', 'name2', 'name4', 'name2', 'name5', 'name5'], 'colB': ['red', 'yellow', 'yellow', 'black', 'yellow', 'green', 'blue']})
    
    groups = df.groupby("colA")
    
    df_1 = groups.count().rename(columns={'colB': 'Count grouped A'})
    df_2 = groups["colB"].apply(','.join).reset_index(name='Color')
    
    df_new = df_2.join(df_1, on='colA')
    
    print(df_new)
    

    EDIT:

    If you would keep Color as list then it could be simpler.

    You could use .str.len() to count elements in list

    .str suggests that it has functions for strings but some of them work also with list (ie. .str[1:4]) or even with dictionary (ie. .str[key])

    import pandas as pd
    
    df = pd.DataFrame({'colA': ['name1', 'name2', 'name2', 'name4', 'name2', 'name5', 'name5'], 'colB': ['red', 'yellow', 'yellow', 'black', 'yellow', 'green', 'blue']})
    
    df_new = df.groupby("colA")["colB"].apply(list).reset_index(name='Color')
    df_new['Count grouped A'] = df_new['Color'].str.len()
    
    print(df_new)