Search code examples
pythonpandasdataframepython-decoratorsdatetimeindex

DatetimeIndex stops DataFrame from returning from a decorated function


I have a decorator that adds the return of a function to a supplied dictionary or pandas data frame. This works fine as long as the data frame does not have different DateTimeIndex on the return. I tried simply merging the data frames and taking the index into account but for some reason that means the collecting frame ends up empty.

So this code works fine:

    def add_return_to_dict_or_pandas_col_decorator(return_dict):
        def actual_decorator(func):
            @functools.wraps(func)
            def wrapper(*args, **kwargs):
                nonlocal return_dict
                return_dict[args[0]] = func(*args, **kwargs)        
            return wrapper    
        return actual_decorator

If applied to:

accumulate_dict = dict()    
@add_return_to_dict_or_pandas_col_decorator(accumulate_dict)
def f2(identifier, x):
    return x * x    
f2('thrity', 30)
f2('three', 3)
print(accumulate_dict)

accumulate_df = pd.DataFrame()
@add_return_to_dict_or_pandas_col_decorator(accumulate_df)
def f3(identifier, x):
    return [x, x * x, x + x]
f3('thrity', 30)
f3('three', 3)
print(accumulate_df)

But using functions that return data frames with DateTimeIndex makes it fail (because they don't really match). Here is an attempt at fixing that:

def add_return_to_pandas_indexed_col_decorator(return_data_frame):
def actual_decorator(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        nonlocal return_data_frame
        if return_data_frame.shape[0] > 0:
            return_data_frame = pd.merge(return_data_frame, func(*args, **kwargs),
                                         how='outer', left_index=True, right_index=True)
        else:
            return_data_frame = func(*args, **kwargs)
    return wrapper

return actual_decorator

Now my test code actually runs through this (just imagine the function returning a data frame with a DateTimeIndex) but the end result is an empty data frame.

return_df = pd.DataFrame()
tckrs = ['GLD', 'GDX']  
@add_return_to_pandas_indexed_col_decorator(return_df)
def set_df_get_return_series(*args, **kwargs):
    return get_return_series(*args, **kwargs)

for ticker in tckrs:
    set_df_get_return_series(ticker)
print(return_df)

Where get_return_series is:

def get_return_series(ticker):
    from faker import Faker
    fake = Faker()
    return pd.DataFrame(np.random.randn(2).tolist(),
                    columns=[ticker],
                    index=pd.DatetimeIndex([fake.date_between(start_date='-30y', end_date='-1d'),
                                            fake.date_between(start_date='today', end_date='+30y')]))

Solution

  • Got a solution to this through a colleague (Thanks Dillon). Issue looks related to the overwrite of the entire variable. The overwrite is seen as nonlocal within the function, but any complete overwrite of the variable is not preserved beyond the local scope of the decorator. The global/outer name can not be pointed to a different memory address within a decorated decorator, but any mutable members of it can. This also explains why the previous implementation works but not the indexed one. So the problem is not directly related to the DatetimeIndex.

    Added an extra indirection to make it work. If anybody can find a nicer implementation, please post:

    def add_return_to_pandas_indexed_col_decorator(return_object):
    def actual_decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            nonlocal return_object
            if return_object.frame is not None:
                return_object.frame = pd.merge(return_object.frame, func(*args, **kwargs), how='outer', left_index=True,
                                               right_index=True)
            else:
                return_object.frame = func(*args, **kwargs)
        return wrapper
    return actual_decorator
    

    To be used as such (probably a good idea to integrate the Test Class in the decorator):

    class Test(object):
        def __init__(self):
            self.frame = pd.DataFrame()
    
    tckrs = ['GLD', 'GDX']
    accumulate_object = Test()
    @add_return_to_pandas_indexed_col_decorator(accumulate_object)
    def set_df_get_return_series(*args, **kwargs):
        return get_return_series(*args, **kwargs)
    for ticker in tckrs:
        set_df_get_return_series(ticker)
    print(accumulate_object.frame)