I have a dataframe as below. "feature1", "feature2" and "feature3" are independent features. The possible values in these features as 1, 2, 3, 4 only. I need to find the size, grouped by the values for each features. Shown below is how I'm doing it now, for each feature separately. Is there a way to do it in a single step or more optimal way ? Thank you.
import pandas as pd
df = pd.DataFrame({'State' : ['AZ', 'FL', 'AZ', 'FL', 'FL', 'FL', 'AZ', 'FL', 'AZ', 'AZ', 'FL', 'AZ', 'FL', 'AZ', 'AZ'],
'feature1' : [1, 3, 2, 2, 4, 3, 3, 1, 2, 4, 3, 1, 1, 1, 1],
'feature2' : [1, 2, 1, 2, 3, 2, 4, 1, 2, 4, 3, 1, 1, 1, 1],
'feature3' : [1, 4, 1, 2, 1, 4, 3, 3, 2, 4, 3, 1, 1, 1, 1],})
(
df
.groupby(['State', 'feature1'])
['feature1']
.size()
.to_frame('N')
)
Output for feature1:
melt
+ value_counts
df.melt('State', var_name='feature').value_counts().sort_index()
Result
State feature value
AZ feature1 1 4
2 2
3 1
4 1
feature2 1 5
2 1
4 2
feature3 1 5
2 1
3 1
4 1
FL feature1 1 2
2 1
3 3
4 1
feature2 1 2
2 3
3 2
feature3 1 2
2 1
3 2
4 2
dtype: int64
melt
+ crosstab
s = df.melt('State', var_name='feature')
pd.crosstab(s['State'], [s['feature'], s['value']])
Result
feature feature1 feature2 feature3
value 1 2 3 4 1 2 3 4 1 2 3 4
State
AZ 4 2 1 1 5 1 0 2 5 1 1 1
FL 2 1 3 1 2 3 2 0 2 1 2 2