I have an analysis result table
Name Analysis Result Date Type
0 Doe J. Albumine 10.6 23.02.2021 8:07:22 before
1 Doe J. Albumine 6.5 25.03.2021 8:08:09 after
2 Pine C. Albumine 13.3 25.03.2021 9:17:54 before
3 Pine C. Albumine 11.0 22.02.2021 9:25:54 after
4 Jackson D. Albumine 14.2 23.02.2021 10:51:38 before
5 Jackson D Albumine 12.2 23.03.2021 after
6 Schafer L. Albumine 8.4 25.02.2021 10:39:39 before
7 Schafer L. Albumine 9.3 25.03.2021 12:06:15 after
My goal is to calculate the difference between two analyses of each patient (these are all fictional) based on 'Type' column and obtain a following table:
Name Before After Difference
0 Doe j. 10.6 6.5 3.9
I've tried groupby but didn't succeed. Would appreciate any help.
Use DataFrame.pivot
with subtract:
df = df.pivot('Name','Type','Result').reset_index().rename_axis(columns=None)
df['diff'] = df['before'].sub(df['after'])
print (df)
Name after before diff
0 Doe J. 6.5 10.6 4.1
1 Jackson D. 12.2 14.2 2.0
2 Pine C. 11.0 13.3 2.3
3 Schafer L. 9.3 8.4 -0.9
If error:
ValueError: Index contains duplicate entries, cannot reshape
it means there are duplicates, it means for same Name, Type
are 2 or more values like:
print (df)
Name Analysis Result Date Type
0 Doe J. Albumine 10.6 23.02.2021 8:07:22 before <- duplicate Doe J., before
0 Doe J. Albumine 10.6 23.02.2021 8:07:22 before <- duplicate Doe J., before
1 Doe J. Albumine 6.5 25.03.2021 8:08:09 after
2 Pine C. Albumine 13.3 25.03.2021 9:17:54 before
3 Pine C. Albumine 11.0 22.02.2021 9:25:54 after
4 Jackson D. Albumine 14.2 23.02.2021 10:51:38 before
5 Jackson D. Albumine 12.2 23.03.2021 after
6 Schafer L. Albumine 8.4 25.02.2021 10:39:39 before
7 Schafer L. Albumine 9.3 25.03.2021 12:06:15 after
Possible solution with DataFrame.pivot_table
and some aggregate function like mean
, sum
. If need first matched value use aggfunc='first'
df = df.pivot_table(index='Name',columns='Type',values='Result', aggfunc='sum').reset_index().rename_axis(columns=None)
df['diff'] = df['before'].sub(df['after'])
print (df)
Name after before diff
0 Doe J. 6.5 21.2 14.7 <- 21.2 because sum
1 Jackson D. 12.2 14.2 2.0
2 Pine C. 11.0 13.3 2.3
3 Schafer L. 9.3 8.4 -0.9