Pandas Creating Normal Dist series

I'm trying to convert an excel "normal distribution" formula into python.

(1-NORM.DIST(a+col,b,c,TRUE))/(1-NORM.DIST(a,b,c,TRUE)))

For example: Here's my given df

Id   a  b    c    
ijk  4  3.5  12.53
xyz  12 3    10.74

My goal:

Id   a  b    c      0    1    2    3
ijk  4  3.5  12.53  1   .93  .87  .81
xyz  12 3    10.74  1   .87  .76  .66

Here's the math behind it:

column 0: always 1

column 1: (1-NORM.DIST(a+1,b,c,TRUE))/(1-NORM.DIST(a,b,c,TRUE))

column 2: (1-NORM.DIST(a+2,b,c,TRUE))/(1-NORM.DIST(a,b,c,TRUE))

column 3: (1-NORM.DIST(a+3,b,c,TRUE))/(1-NORM.DIST(a,b,c,TRUE))

This is what I have so far:

df1 = pd.DataFrame(df, columns=np.arange(0,4))
result = pd.concat([df, df1], axis=1, join_axes=[df.index])
result[0] = 1

I'm not sure what to do after this.

This is how I use the normal distribution function: https://support.office.com/en-us/article/normdist-function-126db625-c53e-4591-9a22-c9ff422d6d58

Many many thanks!

Solution

NORM.DIST(..., TRUE) means the cumulative distribution function and 1 - NORM.DIST(..., TRUE) means the survival function. These are available under scipy's stats module (see ss.norm). For example,

import scipy.stats as ss
ss.norm.cdf(4, 3.5, 12.53)
Out:
0.51591526057026538

For your case, you can first define a function:

def normalize(a, b, c, col):
    return ss.norm.sf(a+col, b, c) / ss.norm.sf(a, b, c)

and call that function with apply:

for col in range(4):
    df[col] = df.apply(lambda x: normalize(x.a, x.b, x.c, col), axis=1)

df
Out: 
    Id   a    b      c    0         1         2         3
0  ijk   4  3.5  12.53  1.0  0.934455  0.869533  0.805636
1  xyz  12  3.0  10.74  1.0  0.875050  0.760469  0.656303

This is not the most efficient approach as it calculates the survival function for same values again and involves two loops. One level of loops can be omitted by passing an array of values to ss.sf:

out = df.apply(
    lambda x: pd.Series(
        ss.norm.sf(x.a + np.arange(4), x.b, x.c) / ss.norm.sf(x.a, x.b, x.c)
        ), axis=1
)

Out: 
     0         1         2         3
0  1.0  0.934455  0.869533  0.805636
1  1.0  0.875050  0.760469  0.656303

And you can use join to add this to your original DataFrame:

df.join(out)
Out: 
    Id   a    b      c    0         1         2         3
0  ijk   4  3.5  12.53  1.0  0.934455  0.869533  0.805636
1  xyz  12  3.0  10.74  1.0  0.875050  0.760469  0.656303