Search code examples
pythonpandasdataframeapplyiterable-unpacking

pandas apply function that returns multiple values to rows in pandas dataframe


I have a dataframe with a timeindex and 3 columns containing the coordinates of a 3D vector:

                         x             y             z
ts
2014-05-15 10:38         0.120117      0.987305      0.116211
2014-05-15 10:39         0.117188      0.984375      0.122070
2014-05-15 10:40         0.119141      0.987305      0.119141
2014-05-15 10:41         0.116211      0.984375      0.120117
2014-05-15 10:42         0.119141      0.983398      0.118164

I would like to apply a transformation to each row that also returns a vector

def myfunc(a, b, c):
    do something
    return e, f, g

but if I do:

df.apply(myfunc, axis=1)

I end up with a Pandas series whose elements are tuples. This is beacause apply will take the result of myfunc without unpacking it. How can I change myfunc so that I obtain a new df with 3 columns?

Edit:

All solutions below work. The Series solution does allow for column names, the List solution seem to execute faster.

def myfunc1(args):
    e=args[0] + 2*args[1]
    f=args[1]*args[2] +1
    g=args[2] + args[0] * args[1]
    return pd.Series([e,f,g], index=['a', 'b', 'c'])

def myfunc2(args):
    e=args[0] + 2*args[1]
    f=args[1]*args[2] +1
    g=args[2] + args[0] * args[1]
    return [e,f,g]

%timeit df.apply(myfunc1 ,axis=1)

100 loops, best of 3: 4.51 ms per loop

%timeit df.apply(myfunc2 ,axis=1)

100 loops, best of 3: 2.75 ms per loop

Solution

  • Just return a list instead of tuple.

    In [81]: df
    Out[81]: 
                                x         y         z
    ts                                               
    2014-05-15 10:38:00  0.120117  0.987305  0.116211
    2014-05-15 10:39:00  0.117188  0.984375  0.122070
    2014-05-15 10:40:00  0.119141  0.987305  0.119141
    2014-05-15 10:41:00  0.116211  0.984375  0.120117
    2014-05-15 10:42:00  0.119141  0.983398  0.118164
    
    [5 rows x 3 columns]
    
    In [82]: def myfunc(args):
       ....:        e=args[0] + 2*args[1]
       ....:        f=args[1]*args[2] +1
       ....:        g=args[2] + args[0] * args[1]
       ....:        return [e,f,g]
       ....: 
    
    In [83]: df.apply(myfunc ,axis=1)
    Out[83]: 
                                x         y         z
    ts                                               
    2014-05-15 10:38:00  2.094727  1.114736  0.234803
    2014-05-15 10:39:00  2.085938  1.120163  0.237427
    2014-05-15 10:40:00  2.093751  1.117629  0.236770
    2014-05-15 10:41:00  2.084961  1.118240  0.234512
    2014-05-15 10:42:00  2.085937  1.116202  0.235327