python pandas dataframe pandas-groupby spotfire

Adding a calculated column of an equation using pandas groupby()

I'm new to working with pandas and I am trying to add a column with groupby() based off of a Spotfire calculated column formula

Suppose I had a table with the following data (df1):

'Well ID','Assay','Source','Treat','BkgrdSub Fluorescence','Calced'
'A1',4,'Source 1','OPA',-215.75,0.035583351
'A2',4,'Source 1','OPA',-160.75,0.130472288
'A3',4,'Source 1','OPA',343.25,1
'H10',6,'Source 1','OPP',9896,1
'H11',6,'Source 1','OPP',9892,0.999605226
'H12',6,'Source 1','CN',-1,1
'A1',4,'Source 2','OPA',-170,0.03682641
'A2',4,'Source 2','OPA',-86,0.083431583
'A3',4,'Source 2','OPA',1566,1
'H10',6,'Source 2','ZI',4885,0.809271732
'H11',6,'Source 2','ZI',6092,1
'H12',6,'Source 2','CN',78,1
'A1',4,'Source 3','OPA',-114.5,0.037329147
'A2',4,'Source 3','OPA',-114.5,0.037329147
'A3',4,'Source 3','OPA',3028.5,1
'H10',6,'Source 3','ZIII',4245.375,0.85305734
'H11',6,'Source 3','ZIII',5017.375,1
'H12',6,'Source 3','CN',20.375,1
'A1',4,'Source 4','OPA',-183.375,0.017731683
'A2',4,'Source 4','OPA',-102.375,0.044831047
'A3',4,'Source 4','OPA',2752.625,1
'H10',6,'Source 4','ZIIII',2635.75,0.697943562
'H11',6,'Source 4','ZIIII',3878.75,1
'H12',6,'Source 4','CN',-10.25,1
'A1',4,'Source 5','OPA',-236.375,0
'A2',4,'Source 5','OPA',-199.375,0.028094153
'A3',4,'Source 5','OPA',1080.625,1
'H10',6,'Source 5','ZV',3489,0.952202946
'H11',6,'Source 5','ZV',3676,1
'H12',6,'Source 5','CN',31,1
'A1',4,'Source 6','OPA',-221.375,0.008870491
'A2',4,'Source 6','OPA',-150.375,0.050857481
'A3',4,'Source 6','OPA',1454.625,1
'H10',6,'Source 6','ZVI',2224.375,1
'H11',6,'Source 6','ZVI',1418.375,0.672457584
'H12',6,'Source 6','CN',716.375,1

I want to be able to add one calculated column that is defined by the Spotfire equation:

([BkgrdSub Fluorescence] - Min([BkgrdSub Fluorescence])) / Max([BkgrdSub Fluorescence] - Min([BkgrdSub Fluorescence])) OVER ([Treat],[Source],[Assay])

I was building the script one step at a time and I got to the point where I tried to run this with groupby():

import pandas as pd

df1.insert(5,"Scaled BckgrdSub Fluorescence min","")
df1['Scaled BckgrdSub Fluorescence min'] = df1.groupby(['Treat','Source','Assay'])['BkgrdSub Fluorescence'].transform('min')
df1.insert(6,"Scaled BckgrdSub Fluorescence eq","")
df1['Scaled BckgrdSub Fluorescence eq'] = df1[['BkgrdSub Fluorescence'] - ['Scaled BckgrdSub Fluorescence min']].groupby(df1['Treat'],df1['Source'],df1['Assay']).transform('max')

But I get the error:

TypeError: unsupported operand type(s) for -: 'list' and 'list'

Which I understand means that I can't subtract a list from a list. So obviously the syntax doesn't support equations within the groupby() function.

I also tried to avoid this syntax error by avoiding using groupby() with 'Scaled BckgrdSub Fluorescence' being the desired outcome column:

df1.insert(5,"Scaled BckgrdSub Fluorescence min","")
df1['Scaled BckgrdSub Fluorescence min'] = df1.groupby(['Treat','Source','Assay'])['BkgrdSub Fluorescence'].transform('min')
df1.insert(6,"Scaled BckgrdSub Fluorescence eq","")
df1['Scaled BckgrdSub Fluorescence eq'] = df1['BkgrdSub Fluorescence'] - df1['Scaled BckgrdSub Fluorescence min']
df1.insert(7,"Scaled BckgrdSub Fluorescence max","")
df1['Scaled BckgrdSub Fluorescence max'] = df1.groupby(['Treat','Source','Assay'])['Scaled BckgrdSub Fluorescence eq'].transform('max')
df1.insert(8,"Scaled BckgrdSub Fluorescence","")
df1['Scaled BckgrdSub Fluorescence'] = df1['Scaled BckgrdSub Fluorescence eq'] / df1['Scaled BckgrdSub Fluorescence max']

However, this isn't the same outcome as the calculated column you get in Spotfire.

The expected output, taken from Spotfire, for the calculated column is the already shown in the 'Calced' column.

So my question is, is there a simple way to add the column that I want with the groupby() function in a few lines while remaining accurate?

Solution

Here is one way to duplicate your target numbers:

df1.groupby('Source').apply(lambda x: ((x['BkgrdSub Fluorescence'] - df1['BkgrdSub Fluorescence'].min()) / (x['BkgrdSub Fluorescence'].max() - df1['BkgrdSub Fluorescence'].min())))

Result:

Source1  0     0.035583
         1     0.130472
         2     1.000000
Source2  3     0.036826
         4     0.083432
         5     1.000000
Source3  6     0.037329
         7     0.037329
         8     1.000000
Source4  9     0.017732
         10    0.044831
         11    1.000000
Source5  12    0.000000
         13    0.028094
         14    1.000000
Source6  15    0.008870
         16    0.050857
         17    1.000000

Updated for new dataset:

df1.index.name = 'ID' # will be used for sorting
# group and apply equation
grouped = df1.groupby(['Treat','Source','Assay']).apply(lambda x: ((x['BkgrdSub Fluorescence'] 
      - df1['BkgrdSub Fluorescence'].min()) / (x['BkgrdSub Fluorescence'].max() 
      - df1['BkgrdSub Fluorescence'].min())))
# remove multi-index from grouped, sort on ID, and insert into df1
df1['Calculated'] = grouped.sort_index(level='ID').reset_index(drop=True)

New result (see Calculated column):

   Well ID  Assay    Source  Treat  BkgrdSub Fluorescence    Calced   Calculated
ID                                                                              
0       A1      4  Source 1    OPA               -215.750  0.035583     0.035583
1       A2      4  Source 1    OPA               -160.750  0.130472     0.130472
2       A3      4  Source 1    OPA                343.250  1.000000     1.000000
3      H10      6  Source 1    OPP               9896.000  1.000000     1.000000
4      H11      6  Source 1    OPP               9892.000  0.999605     0.999605
5      H12      6  Source 1     CN                 -1.000  1.000000     1.000000
6       A1      4  Source 2    OPA               -170.000  0.036826     0.036826
7       A2      4  Source 2    OPA                -86.000  0.083432     0.083432
8       A3      4  Source 2    OPA               1566.000  1.000000     1.000000
9      H10      6  Source 2     ZI               4885.000  0.809272     0.809272
10     H11      6  Source 2     ZI               6092.000  1.000000     1.000000
11     H12      6  Source 2     CN                 78.000  1.000000     1.000000
12      A1      4  Source 3    OPA               -114.500  0.037329     0.037329
13      A2      4  Source 3    OPA               -114.500  0.037329     0.037329
14      A3      4  Source 3    OPA               3028.500  1.000000     1.000000
15     H10      6  Source 3   ZIII               4245.375  0.853057     0.853057
16     H11      6  Source 3   ZIII               5017.375  1.000000     1.000000
17     H12      6  Source 3     CN                 20.375  1.000000     1.000000
18      A1      4  Source 4    OPA               -183.375  0.017732     0.017732
19      A2      4  Source 4    OPA               -102.375  0.044831     0.044831
20      A3      4  Source 4    OPA               2752.625  1.000000     1.000000
21     H10      6  Source 4  ZIIII               2635.750  0.697944     0.697944
22     H11      6  Source 4  ZIIII               3878.750  1.000000     1.000000
23     H12      6  Source 4     CN                -10.250  1.000000     1.000000
24      A1      4  Source 5    OPA               -236.375  0.000000     0.000000
25      A2      4  Source 5    OPA               -199.375  0.028094     0.028094
26      A3      4  Source 5    OPA               1080.625  1.000000     1.000000
27     H10      6  Source 5     ZV               3489.000  0.952203     0.952203
28     H11      6  Source 5     ZV               3676.000  1.000000     1.000000
29     H12      6  Source 5     CN                 31.000  1.000000     1.000000
30      A1      4  Source 6    OPA               -221.375  0.008870     0.008870
31      A2      4  Source 6    OPA               -150.375  0.050857     0.050857
32      A3      4  Source 6    OPA               1454.625  1.000000     1.000000
33     H10      6  Source 6    ZVI               2224.375  1.000000     1.000000
34     H11      6  Source 6    ZVI               1418.375  0.672458     0.672458
35     H12      6  Source 6     CN                716.375  1.000000     1.000000