I have a data frame like this with below code,
df=pd.DataFrame(columns=['col1', 'col2', 'col3'])
df.col1=['q1', 'q2', 'q2', 'q3', 'q4', 'q4']
df.col2=['b', 'a', 'a', 'c', 'b', 'b']
df.col3=['p', 'q', 'r', 'p', 'q', 'q']
df
col1 col2 col3
0 q1 b p
1 q2 a q
2 q2 a r
3 q3 c p
4 q4 b q
5 q4 b q
Now I want to col1 wise count of each col2 and col3 variables as rows and unique values of col2 and col3 as columns, so that the final dataframe would look like,
col1 a b c p q r
q1 0 1 0 1 0 0
q2 2 0 0 0 1 1
q3 0 0 1 1 0 0
q4 0 2 0 0 2 0
I could do this using a for loop iterating through the dataframe and storing the results, but this is computationally expensive and not a right way to do it.
Looking for alternatives pythonic/pandas way to do it efficiently.
melt the data by col1
, then crosstab
:
m = df.melt('col1')
pd.crosstab(m['col1'], m['value']).reset_index()
Output:
value col1 a b c p q r
0 q1 0 1 0 1 0 0
1 q2 2 0 0 0 1 1
2 q3 0 0 1 1 0 0
3 q4 0 2 0 0 2 0