Search code examples
pythonpandasfillna

How to fill average baseline values of control rows to experiment rows in avg baseline column


I have pandas data frame as below.

Each sessionid represents an experiment, I have to set the baseline for treatment based on the sessionid, for example, sessionid 'a' control group(0)'s average of the 3 sessions is 2 so the baseline value of the no treatment 0 (control group) should be set for the treatment group 'abcd'. Now I have them as nans.

How do I set the baseline of the treatment group in pandas such that sessionid 'a' treatment 'abcd' gets a baseline as 2 and likewise for all treatments. ?

I am a complete newbie, so I don't have an idea of how to write code for this forgive me.

treatment sessionid response avgbaseline
0            a         2          2
0            a         2          2
0            a         2          2
abcd         a         3          nan
abcd         a         3          nan
abcd         a         3          nan
0            b         1          1
0            b         1          1
0            b         1          1
efgh         b         2          nan
efgh         b         2          nan
efgh         b         2          nan
0            c         4          4
0            c         4          4
0            c         4          4
ijkl         c         5          nan
ijkl         c         5          nan
ijkl         c         5          nan

#expected result

treatment sessionid response avgbaseline
0            a         2          2
0            a         2          2
0            a         2          2
abcd         a         3          2
abcd         a         3          2
abcd         a         3          2
0            b         1          1
0            b         1          1
0            b         1          1
efgh         b         2          1
efgh         b         2          1
efgh         b         2          1
0            c         4          4
0            c         4          4
0            c         4          4
ijkl         c         5          4
ijkl         c         5          4
ijkl         c         5          4


Solution

  • IIUC, and treatment 0 is always the first record in a session, then you can use:

    df['avgbaseline'] = df.groupby('sessionid')['avgbaseline'].ffill()