Search code examples
pythondataframetime-seriesnanstatsmodels

pandas dataframe apply function to columns with nans


I have a large pandas dataframe with many time series of different lengths which looks something like this:

df = pd.DataFrame({'date': pd.date_range(start='2013-01-01', periods=30, freq='Q'),
                   'series1': range(30), 
                   'series2': range(30), 
                   'series3': range(30)}).set_index('date')

df.loc[:11, 'series1'] = np.nan
df.loc[:6, 'series2'] = np.nan 

I want to apply a statistical filter from statsmodels on each time series but the modul can't handle missing values and returns a dataframe full of NaNs.

import statsmodels.api as sm
df_cycles, df_trend = sm.tsa.filters.cffilter(df, low=6, high=40)

Is there a clean way to apply the function to each column individually starting from the first non-nan observations? I don't want to exclude the rows with missing values.


Solution

  • You can apply the filter for each column, and store the result in some object. In this case I am making a dictionary with the column name as the key and the cycles and a dataframe containing trend and results as the value.

    filtered = {}
    for c in df.columns:
        cycles, trend = sm.tsa.filters.cffilter(
            df[c].astype(float).dropna(),
            low=6,
            high=40
        )
        filtered[c] = pd.DataFrame({'cycles':cycles, 'trend':trend})
    
    
    print(filtered['series1'])
    
    #                  cycles  trend
    #date                           
    #2015-12-31  8.881784e-16   11.0
    #2016-03-31  6.661338e-16   11.0
    #...
    
    
    print(filtered['series3'])
    
    #            cycles  trend
    #date                     
    #2013-03-31     0.0    0.0
    #2013-06-30     0.0    0.0
    #...
    

    As you can see, for each column, the rows containing NaNs were dropped without affecting the other columns.