python pandas pandas-groupby multi-index

pandas perform multiple transformations and creating a new data frame

I have a daframe where I want to perform multiple (independent) transformations, and they create a new data frame (with a multi-index), where each index correspond to one of the transformations.

More concretely:

df = pd.DataFrame([[1, "X", 'H', 100], [2, "Y", 'K', 100] ,[3, "X", 'H', 200], [4, "Y", 'H', 100]], columns=['id', 'XY', 'HK', 'A']).set_index("id")


    XY  HK  A
id          
1   X   H   100
2   Y   K   100
3   X   H   200
4   Y   H   100

I now want to know how many XY and how many HK have the same A. The final results should look like this (with multi-index):

     XY      HK
    X   Y   H   K   
A       
100 1.0 2.0 2.0 1.0
200 1.0 NaN 1.0 NaN

This is what I was expecting to work:

df.groupby(['A', 'XY', 'HK']).count()

but it only contains the multi index without the values.

What I was able to do is:

count the XY:

xy_count = df.groupby(['A', 'XY']).count().HK.unstack()

count the HK:

hk_count = df.groupby(['A', 'HK']).count().XY.unstack()

merge the results (?)

results = pd.DataFrame()
results['XY'] = xy_count
results['HK'] = hk_count

but this last part does now work.

So my questions are:

Q1: There seems to be a better way I am missing Q2: why does step 3 not work?

Not really a question, but also I find confusing that to get the HK, you have to call the XY column and vice versa.

Solution

You can do:

groups = df.groupby('A')
pd.concat({col:groups[col].value_counts().unstack() 
           for col in ['XY','HK']}, axis=1)

Output:

      XY        HK     
       X    Y    H    K
A                      
100  1.0  2.0  2.0  1.0
200  1.0  NaN  1.0  NaN