Search code examples
pythonpandasconditional-statementsfillna

fillna with a condition (time limitation)


thanks for advance for checking the question.

i got a group of data, there are a lot of missing values for the column "bond_yield". my first question had been solved, which requires me to fill na with previous data. my code is like this:

#sort the data first by company and then by time
df_dataset = df_dataset.sort_values(by=['gvkey','date'])

#fill in missing bond yields with previous day's data with a condition: 
#fill the missing data only if the company of the row is as same as the that of the above row. otherwise, not fill in. 
#this is an important condition to avoid filling the first row of company B with the data from the last row of company A. 
df_dataset['bond_yield']= df_dataset.groupby('gvkey')['bond_yield'].fillna(method='ffill')

df_dataset.head()

    company gvkey   date    market_cds_spread   bond_yield  
34315   AMCN.AIRLNS.GP.INC  1045    20040101    NaN NaN
34316   AMCN.AIRLNS.GP.INC  1045    20040102    NaN NaN
34317   AMCN.AIRLNS.GP.INC  1045    20040105    NaN NaN
34318   AMCN.AIRLNS.GP.INC  1045    20040106    NaN NaN
34319   AMCN.AIRLNS.GP.INC  1045    20040107    NaN NaN

(yes, the head rows are all NaN. but later, there are values.)

now, i'm asked to fill the data with a condition: fill in missing bond yields with values from previous days only when previous bond yield is from no more than 15 days ago.

i was thinking about isnull().sum(). but it didn't work. it just count all missing values, while i would like to count the value from a certain row forward to previous available data. (i have no idea, just try everything might help. and also my trying was not so direct.)

how can i fillna with the 15 days limitation?


Solution

  • Sorry, this should be a comment, but I cannot leave one yet. fillna has a keyword limit. I think it can do what you want. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.fillna.html#pandas.DataFrame.fillna