Search code examples
pythonpandasgroup-byfillna

How to fillna limited by date in a groupby


I am working with the following Dataframe that has some NaN values inside.

df = pd.DataFrame({'day':[pd.datetime(2020,1,1),pd.datetime(2020,1,3),pd.datetime(2020,1,4),pd.datetime(2020,1,5),pd.datetime(2020,1,6),pd.datetime(2020,1,7),pd.datetime(2020,1,8),pd.datetime(2020,1,8),pd.datetime(2020,6,9)],
                   'TradeID':['01','02','03','04','05','06','07','08','09'],
                   'Security': ['GOOGLE', 'GOOGLE', 'APPLE', 'GOOGLE', 'GOOGLE','GOOGLE','GOOGLE','GOOGLE','GOOGLE'], 
                   'ID': ['ID001', 'ID001', 'ID001', 'ID001', 'ID001','ID001','ID001','ID001','ID001'], 
                   'BSType': ['B', 'S', 'B', 'B', 'B','S','S','S','B'], 
                   'Price':[105.901,106.969,np.nan,107.037,107.038,107.136,np.nan,107.25,np.nan],
                   'Quantity':[1000000,-300000,np.nan,7500000,100000,-100000,np.nan,-7800000,np.nan]
                  })

Out[318]: 
         day TradeID Security     ID BSType    Price   Quantity
0 2020-01-01      01   GOOGLE  ID001      B  105.901  1000000.0
1 2020-01-03      02   GOOGLE  ID001      S  106.969  -300000.0
2 2020-01-04      03    APPLE  ID001      B      NaN        NaN
3 2020-01-05      04   GOOGLE  ID001      B  107.037  7500000.0
4 2020-01-06      05   GOOGLE  ID001      B  107.038   100000.0
5 2020-01-07      06   GOOGLE  ID001      S  107.136  -100000.0
6 2020-01-08      07   GOOGLE  ID001      S      NaN        NaN
7 2020-01-08      08   GOOGLE  ID001      S  107.250 -7800000.0
8 2020-06-09      09   GOOGLE  ID001      B      NaN        NaN

My goal is to fillna with the method ffill only for the same Security, same ID and limited for the next 60 days (not the next 60 observations, because there may be more than one observation per day).

Here is what i tried but is not working, it does not replace any of my NaN values

df=df.groupby(['day',"Security","ID"], as_index=False).fillna(method='ffill',limit=60)

The expected output should look like this: (Note that only the second pair of NaN values have been filled)

  • The first pair of NaN values should not be filled because is not the same Security.
  • The second pair of NaN values should be filled with the previous observation.
  • The third pair on NaN should not be filled because they are out of the 60 days scope.
Out[320]: 
         day TradeID Security     ID BSType    Price   Quantity
0 2020-01-01      01   GOOGLE  ID001      B  105.901  1000000.0
1 2020-01-03      02   GOOGLE  ID001      S  106.969  -300000.0
2 2020-01-04      03    APPLE  ID001      B      NaN        NaN
3 2020-01-05      04   GOOGLE  ID001      B  107.037  7500000.0
4 2020-01-06      05   GOOGLE  ID001      B  107.038   100000.0
5 2020-01-07      06   GOOGLE  ID001      S  107.136  -100000.0
6 2020-01-08      07   GOOGLE  ID001      S  107.136  -100000.0
7 2020-01-08      08   GOOGLE  ID001      S  107.250 -7800000.0
8 2020-06-09      09   GOOGLE  ID001      B      NaN        NaN

So, my question is, ¿is there a plausible way to fill NaN values limiting the ffill method on a certain period?

Thank you very much for you time.


Solution

  • You can group the dataframe on columns Security and ID along with an additional grouper for column day with frequency set to 60 days then use ffill to forward fill the values for the next 60 days:

    g = pd.Grouper(key='day', freq='60d')
    df.assign(**df.groupby(["Security","ID", g]).ffill())
    

             day TradeID Security     ID BSType    Price   Quantity
    0 2020-01-01      01   GOOGLE  ID001      B  105.901  1000000.0
    1 2020-01-03      02   GOOGLE  ID001      S  106.969  -300000.0
    2 2020-01-04      03    APPLE  ID001      B      NaN        NaN
    3 2020-01-05      04   GOOGLE  ID001      B  107.037  7500000.0
    4 2020-01-06      05   GOOGLE  ID001      B  107.038   100000.0
    5 2020-01-07      06   GOOGLE  ID001      S  107.136  -100000.0
    6 2020-01-08      07   GOOGLE  ID001      S  107.136  -100000.0
    7 2020-01-08      08   GOOGLE  ID001      S  107.250 -7800000.0
    8 2020-06-09      09   GOOGLE  ID001      B      NaN        NaN