Search code examples
pythonarraysfunctionpandasmedian

rolling apply in pandas shows "TypeError: only length-1 arrays can be converted to Python scalars"


Dataframe df_implied_full has several columns, one of them is called 'USDZARV1Y Curncy', and it has only floats.

This code works:

mad                          = lambda x: np.median(np.fabs(x - np.median(x)))
df_implied_full['madtest']   = df_implied_full['USDZARV1Y Curncy'].rolling(window=60).apply(mad)

This code doesn't work:

test                         = lambda x: (x - np.median(x))
df_implied_full['rolltest2'] = df_implied_full['USDZARV1Y Curncy'].rolling(window=60).apply(test)

The error shown is:

File "pandas\algos.pyx", line 1831, in pandas.algos.roll_generic (pandas\algos.c:51581)

TypeError: only length-1 arrays can be converted to Python scalars

I'm using Pandas 0.18.1 and python 2.7.12

What is wrong with my code?


Solution

  • There is problem output of x in lambda x: (x ... is numpy array, so if use only test = lambda x: x numpy array cannot be converted to scalar values per each row. I think you need return scalar value only e.g. use x[0] or np.median(x). The best is use custom function and test it.

    Sample with window=2:

    import pandas as pd
    import numpy as np
    
    df_implied_full = pd.DataFrame({'USDZARV1Y Curncy': [1.2,4.6,7.3,4.9,1.5]})
    print (df_implied_full)
    
    def test (x):
        print (x)
    
        #[ 1.2  4.6]
        #[ 4.6  7.3]
        #[ 7.3  4.9]
        #[ 4.9  1.5]
    
        print (type(x))
        #<class 'numpy.ndarray'>
        #<class 'numpy.ndarray'>
        #<class 'numpy.ndarray'>
        #<class 'numpy.ndarray'>
    
        #Return only first value of list
        return x[0]
    
    mad                          = lambda x: np.median(np.fabs(x - np.median(x)))
    df_implied_full['madtest']   = df_implied_full['USDZARV1Y Curncy'].rolling(window=2).apply(test)
    
    print (df_implied_full)
       USDZARV1Y Curncy  madtest
    0               1.2      NaN
    1               4.6      1.2
    2               7.3      4.6
    3               4.9      7.3
    4               1.5      4.9
    

    def test (x):
    def test (x):
        print (x)
    
        #[ 1.2  4.6]
        #[ 4.6  7.3]
        #[ 7.3  4.9]
        #[ 4.9  1.5]
    
        #Return median as scalar
        return np.median(x)
    
    mad                          = lambda x: np.median(np.fabs(x - np.median(x)))
    df_implied_full['madtest']   = df_implied_full['USDZARV1Y Curncy'].rolling(window=2).apply(test)
    
    print (df_implied_full)
       USDZARV1Y Curncy  madtest
    0               1.2      NaN
    1               4.6     2.90
    2               7.3     5.95
    3               4.9     6.10
    4               1.5     3.20