Search code examples
pandasmulti-indexpandas-groupby

pandas groupby transform: multiple functions applied at the same time with custom names


as the title suggests, I want to be able to do the following (best explained with some code) [pandas 0.20.1 is mandatory]

import pandas as pd
import numpy as np

a = pd.DataFrame(np.random.rand(10, 4), columns=[['a','a','b','b'], ['alfa','beta','alfa','beta',]])

def as_is(x):
    return x
def power_2(x):
    return x**2

# desired result

a.transform([as_is, power_2])

the problem is the function could be more complex than this and thus I would lose the "naming" feature as pandas.DataFrame.transform only allows for lists to be passed whereas a dictionary would have been most convenient.

going back to the basics, I got to this:

dict_funct= {'as_is': as_is, 'power_2': power_2}

def wrapper(x):
    return pd.concat({k: x.apply(v) for k,v in dict_funct.items()}, axis=1)

a.groupby(level=[0,1], axis=1).apply(wrapper)

but the output Dataframe is all nan, presumably due to multi-index columns ordering. is there any way I can fix this?


Solution

  • If need dict I remove paramater axis in concat to to default (axis=0), but then is necessary add parameter group_keys=False and function unstack:

    def wrapper(x):
        return pd.concat({k: x.apply(v) for k,v in dict_funct.items()})
    
    a.groupby(level=[0,1], axis=1, group_keys=False).apply(wrapper).unstack(0)
    

    Similar solution:

    def wrapper(x):
        return pd.concat({k: x.transform(v) for k,v in dict_funct.items()})
    
    a.groupby(level=[0,1], axis=1, group_keys=False).apply(wrapper).unstack(0)
    

    Another solution is simply add list comprehension:

    a.transform([v for k, v in dict_funct.items()])