Search code examples
pythonpandaspandas-groupbymulti-index

pandas perform multiple transformations and creating a new data frame


I have a daframe where I want to perform multiple (independent) transformations, and they create a new data frame (with a multi-index), where each index correspond to one of the transformations.

More concretely:

df = pd.DataFrame([[1, "X", 'H', 100], [2, "Y", 'K', 100] ,[3, "X", 'H', 200], [4, "Y", 'H', 100]], columns=['id', 'XY', 'HK', 'A']).set_index("id")


    XY  HK  A
id          
1   X   H   100
2   Y   K   100
3   X   H   200
4   Y   H   100

I now want to know how many XY and how many HK have the same A. The final results should look like this (with multi-index):

     XY      HK
    X   Y   H   K   
A       
100 1.0 2.0 2.0 1.0
200 1.0 NaN 1.0 NaN

This is what I was expecting to work:

df.groupby(['A', 'XY', 'HK']).count()

but it only contains the multi index without the values.

What I was able to do is:

  1. count the XY:

    xy_count = df.groupby(['A', 'XY']).count().HK.unstack()
    
  2. count the HK:

    hk_count = df.groupby(['A', 'HK']).count().XY.unstack()
    
  3. merge the results (?)

    results = pd.DataFrame()
    results['XY'] = xy_count
    results['HK'] = hk_count 
    

but this last part does now work.

So my questions are:

Q1: There seems to be a better way I am missing Q2: why does step 3 not work?

Not really a question, but also I find confusing that to get the HK, you have to call the XY column and vice versa.


Solution

  • You can do:

    groups = df.groupby('A')
    pd.concat({col:groups[col].value_counts().unstack() 
               for col in ['XY','HK']}, axis=1)
    

    Output:

          XY        HK     
           X    Y    H    K
    A                      
    100  1.0  2.0  2.0  1.0
    200  1.0  NaN  1.0  NaN