I have pandas data frame as below.
Each sessionid represents an experiment, I have to set the baseline for treatment based on the sessionid, for example, sessionid 'a' control group(0)'s average of the 3 sessions is 2 so the baseline value of the no treatment 0 (control group) should be set for the treatment group 'abcd'. Now I have them as nans.
How do I set the baseline of the treatment group in pandas such that sessionid 'a' treatment 'abcd' gets a baseline as 2 and likewise for all treatments. ?
I am a complete newbie, so I don't have an idea of how to write code for this forgive me.
treatment sessionid response avgbaseline
0 a 2 2
0 a 2 2
0 a 2 2
abcd a 3 nan
abcd a 3 nan
abcd a 3 nan
0 b 1 1
0 b 1 1
0 b 1 1
efgh b 2 nan
efgh b 2 nan
efgh b 2 nan
0 c 4 4
0 c 4 4
0 c 4 4
ijkl c 5 nan
ijkl c 5 nan
ijkl c 5 nan
#expected result
treatment sessionid response avgbaseline
0 a 2 2
0 a 2 2
0 a 2 2
abcd a 3 2
abcd a 3 2
abcd a 3 2
0 b 1 1
0 b 1 1
0 b 1 1
efgh b 2 1
efgh b 2 1
efgh b 2 1
0 c 4 4
0 c 4 4
0 c 4 4
ijkl c 5 4
ijkl c 5 4
ijkl c 5 4
IIUC, and treatment 0 is always the first record in a session, then you can use:
df['avgbaseline'] = df.groupby('sessionid')['avgbaseline'].ffill()