I have following data frame:
KEY PROD PARAMETER Y/N
1 AAA PARAM1 Y
1 AAA PARAM2 N
1 AAA PARAM3 N
2 AAA PARAM1 N
2 AAA PARAM2 Y
2 AAA PARAM3 Y
3 CCC PARAM1 Y
3 CCC PARAM2 Y
3 CCC PARAM3 Y
I am interested in summarizing Y/N column values by PROD and PARAMETER columns and get the following output:
PROD PARAM Y N
AAA PARAM1 1 1
AAA PARAM2 1 1
AAA PARAM3 1 1
CCC PARAM1 1 0
CCC PARAM2 1 0
CCC PARAM3 1 0
While Y and N values are counts of Y/N column values from the original data frame.
You could use pivot_table
by creating an additional column with the value 1 as it doesn't matter either ways (You are only counting them)
df['Y/Ncount'] = 1
df = df.pivot_table(index=['PROD', 'PARAMETER'], columns=['Y/N'], values=['Y/Ncount'],
aggfunc=sum, fill_value=0)
df.columns = [col for col in df.columns.get_level_values(1)]
df.reset_index()
The simplest operation to use under this scenario would be crosstab
which would produce the frequency counts of values present inside the Y/N column:
pd.crosstab([df['PROD'], df['PARAMETER']], df['Y/N'])