Search code examples
pythonpandasdataframetime-seriesfillna

Pandas resample based on string like PeriodIndex


I have a dataframe like as below

df = pd.DataFrame({'subject_id':[1,1,1,1,1,2,2,2,2,2],
                   'qtr_info' :['2017Q1','2017Q3','2017Q4','2018Q1','2018Q4','2017Q1','2017Q4','2018Q2','2018Q4','2019Q1'],
                   'val' :[5,5,5,5,1,6,5,5,8,3],
                   'Prod_id':['A','B','C','A','E','Q','G','F','G','H']})

I would like to do the below

a) Fill all the missing quarters of a subject

b) fillna for other columns using the mean value for respective columns (for the same subject). Don't refer other subject ids for computing mean value

So, I tried the below

df_time.resample('Q').mean()   #didn't work
df_time.groupby('subject_id').resample('Q).mean()  #didn't work

But I got the below error

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Int64Index'

I expect my output to be like as below for subject_id = 1. Here yellow color shows newly inserted rows (to fill the missing gaps)

enter image description here


Solution

  • Create PeriodIndex first and for resample use Resampler.agg with dictionary, for replace missing values by means use Series.fillna with GroupBy.transform:

    df_time['qtr_info'] = pd.PeriodIndex(df_time['qtr_info'], freq='Q')
    df = (df_time.set_index('qtr_info')
                 .groupby('subject_id')
                 .resample('Q')
                 .agg({'val':'mean', 'Prod_id':'ffill'}))
    
    df['val'] = df['val'].fillna(df.groupby(level=0)['val'].transform('mean'))
    print (df)
                         val Prod_id
    subject_id qtr_info             
    1          2017Q1    5.0       A
               2017Q2    4.2       A
               2017Q3    5.0       B
               2017Q4    5.0       C
               2018Q1    5.0       A
               2018Q2    4.2       A
               2018Q3    4.2       A
               2018Q4    1.0       E
    2          2017Q1    6.0       Q
               2017Q2    5.4       Q
               2017Q3    5.4       Q
               2017Q4    5.0       G
               2018Q1    5.4       G
               2018Q2    5.0       F
               2018Q3    5.4       F
               2018Q4    8.0       G
               2019Q1    3.0       H