Search code examples
pythonpandasmultiple-return-values

Return multiple values from a pandas rolling apply function


I have a function that needs to return multiple values:

def max_dd(ser):
...

    compute i,j,dd

    return i,j,dd

if I have code like this that calls this function passing in a series:

 date1, date2, dd = df.rolling(window).apply(max_dd)

however, I get an error:

pandas.core.base.DataError: No numeric types to aggregate

If I return a single value from max_dd, everything is fine. How do I return multiple values from a function that has been "apply"?


Solution

  • Rolling apply can only produce single numeric values. There is no support for multiple returns or even nonnumeric returns (like something as simple as a string) from rolling apply. Any answer to this question will be a work around.

    That said, a viable workaround is to take advantage of the fact that rolling objects are iterable (as of pandas 1.1.0).

    What’s new in 1.1.0 (July 28, 2020)

    • Made pandas.core.window.rolling.Rolling and pandas.core.window.expanding.Expanding iterable(GH11704)

    Meaning that it is possible to take advantage of the faster grouping and indexing operations of the rolling function, but obtain more flexible behaviour with python:

    def some_fn(df_):
        """
        When iterating over a rolling window it disregards the min_periods
        argument of rolling and will produce DataFrames for all windows
        
        The input is also of type DataFrame not Series
        
        You are completely responsible for doing all operations here,
        including ignoring values if the input is not of the correct shape
        or format
        
        :param df_: A DataFrame produced by rolling
        :return: a column joined, and the max value within the window
        """
        return ','.join(df_['a']), df_['a'].max()
    
    
    window = 5
    results = pd.DataFrame([some_fn(df_) for df_ in df.rolling(window)])
    

    Sample DataFrame and output:

    df = pd.DataFrame({'a': list('abdesfkm')})
    

    df:

       a
    0  a
    1  b
    2  d
    3  e
    4  s
    5  f
    6  k
    7  m
    

    result:

               0  1
    0          a  a
    1        a,b  b
    2      a,b,d  d
    3    a,b,d,e  e
    4  a,b,d,e,s  s
    5  b,d,e,s,f  s
    6  d,e,s,f,k  s
    7  e,s,f,k,m  s