I have a daframe where I want to perform multiple (independent) transformations, and they create a new data frame (with a multi-index), where each index correspond to one of the transformations.
More concretely:
df = pd.DataFrame([[1, "X", 'H', 100], [2, "Y", 'K', 100] ,[3, "X", 'H', 200], [4, "Y", 'H', 100]], columns=['id', 'XY', 'HK', 'A']).set_index("id")
XY HK A
id
1 X H 100
2 Y K 100
3 X H 200
4 Y H 100
I now want to know how many XY
and how many HK
have the same A
. The final results should look like this (with multi-index):
XY HK
X Y H K
A
100 1.0 2.0 2.0 1.0
200 1.0 NaN 1.0 NaN
This is what I was expecting to work:
df.groupby(['A', 'XY', 'HK']).count()
but it only contains the multi index without the values.
What I was able to do is:
count the XY
:
xy_count = df.groupby(['A', 'XY']).count().HK.unstack()
count the HK
:
hk_count = df.groupby(['A', 'HK']).count().XY.unstack()
merge the results (?)
results = pd.DataFrame()
results['XY'] = xy_count
results['HK'] = hk_count
but this last part does now work.
So my questions are:
Q1: There seems to be a better way I am missing Q2: why does step 3 not work?
Not really a question, but also I find confusing that to get the HK, you have to call the XY column and vice versa.
You can do:
groups = df.groupby('A')
pd.concat({col:groups[col].value_counts().unstack()
for col in ['XY','HK']}, axis=1)
Output:
XY HK
X Y H K
A
100 1.0 2.0 2.0 1.0
200 1.0 NaN 1.0 NaN