Search code examples
pythonpandasdataframepivot

Python - count for each of the unique values of one column for multiple other columns (crosstab,pivot)


I've researched this everywhere but nothing is getting me what I need. I have a dataframe with 3 columns. I want to count all the values of one column for the unique values of the other two columns. A crosstab only works with two columns, and the pivots I've tried haven't given me columns for all of the unique values in the counted column.

df=pd.DataFrame({'player':['A','A','B','B','C','D'],
                 'team':['tmX','tmX','tmX','tmX','tmY','tmY'],
                 'result':['hit','hit','hit','miss','miss','hit']})
print(df)

  player team result
0    A    tmX   hit 
1    A    tmX   hit
2    B    tmX   hit
3    B    tmX   miss
4    C    tmY   miss
5    D    tmY   hit

# code here to pivot/crosstab
# print(new_df)    

#this is the result I want
  player team hit miss
0     A   tmX  2    0
1     B   tmX  1    1
2     C   tmY  0    1
3     D   tmY  1    0
    

if I use groupby(), I get the following:

   new_df=xyz.groupby(['player','team'])['result'].count().reset_index()

   print(new_df)

   #this is what I get - counts result, but not by unqiue value
     player team result
   0    A    tmX   2
   1    B    tmX   2
   2    C    tmY   1 
   3    D    tmY   1

Solution

  • You can count sizes by all three columns, and then unstack the result index:

    (
      df.groupby(['player', 'team', 'result'])
        .size()
        .unstack(level=2, fill_value=0)
        .reset_index()
    )
    
    result player team  hit  miss
    0           A  tmX    2     0
    1           B  tmX    1     1
    2           C  tmY    0     1
    3           D  tmY    1     0