Search code examples
pythonpandasdataframegroup-bydata-science

Count and groupby a specfic value


I have a dataframe where i want to count a specific value that occurs in a row. This code below gives the right answer and now i want to add a new coluumn to my dataframe

occur = df.groupby(['Code_5elaag','Essentieel_Optioneel']).size()
occur

**Code_5elaag  Essentieel_Optioneel**
1101         essentieel               8
             optioneel                8
1102         essentieel               8
             optioneel               51
1103         essentieel               8
                                     ..
96231        optioneel                6
96232        essentieel               1
             optioneel                2
96290        essentieel               9
             optioneel               17

When i assign a new colum to the frame this is the output:

uniq['ess'] = df.groupby(['Code_5elaag'])['Essentieel_Optioneel'].transform(np.size)

    Code_5elaag Omschrijving_5elaag Soort_Skill Aantal_skills   ess
0   1101    Officieren landmacht    taken   16  16              15
16  1102    Officieren luchtmacht   taken   59  59              59
75  1103    Officieren marechaussee taken   16  16              16

But that is not what i want i want to divide the amount of Aantal_skills to how much is essentieel and optioneel fo for the first row it should be 8 essentieel and 8 optional


Solution

  • You are close, need grouping by both columns:

    df['ess'] = df.groupby(['Code_5elaag','Essentieel_Optioneel'])['Essentieel_Optioneel'].transform('size')
    

    If need 2 new columns use crosstab with DataFrame.join:

    out = df.join(pd.crosstab(df['Code_5elaag'], df['Essentieel_Optioneel']), on='Code_5elaag')