I'm trying to convert an excel "normal distribution" formula into python.
(1-NORM.DIST(a+col,b,c,TRUE))/(1-NORM.DIST(a,b,c,TRUE)))
For example: Here's my given df
Id a b c
ijk 4 3.5 12.53
xyz 12 3 10.74
My goal:
Id a b c 0 1 2 3
ijk 4 3.5 12.53 1 .93 .87 .81
xyz 12 3 10.74 1 .87 .76 .66
Here's the math behind it:
column 0: always 1
column 1: (1-NORM.DIST(a+1,b,c,TRUE))/(1-NORM.DIST(a,b,c,TRUE))
column 2: (1-NORM.DIST(a+2,b,c,TRUE))/(1-NORM.DIST(a,b,c,TRUE))
column 3: (1-NORM.DIST(a+3,b,c,TRUE))/(1-NORM.DIST(a,b,c,TRUE))
This is what I have so far:
df1 = pd.DataFrame(df, columns=np.arange(0,4))
result = pd.concat([df, df1], axis=1, join_axes=[df.index])
result[0] = 1
I'm not sure what to do after this.
This is how I use the normal distribution function: https://support.office.com/en-us/article/normdist-function-126db625-c53e-4591-9a22-c9ff422d6d58
Many many thanks!
NORM.DIST(..., TRUE)
means the cumulative distribution function and 1 - NORM.DIST(..., TRUE)
means the survival function. These are available under scipy's stats module (see ss.norm). For example,
import scipy.stats as ss
ss.norm.cdf(4, 3.5, 12.53)
Out:
0.51591526057026538
For your case, you can first define a function:
def normalize(a, b, c, col):
return ss.norm.sf(a+col, b, c) / ss.norm.sf(a, b, c)
and call that function with apply
:
for col in range(4):
df[col] = df.apply(lambda x: normalize(x.a, x.b, x.c, col), axis=1)
df
Out:
Id a b c 0 1 2 3
0 ijk 4 3.5 12.53 1.0 0.934455 0.869533 0.805636
1 xyz 12 3.0 10.74 1.0 0.875050 0.760469 0.656303
This is not the most efficient approach as it calculates the survival function for same values again and involves two loops. One level of loops can be omitted by passing an array of values to ss.sf
:
out = df.apply(
lambda x: pd.Series(
ss.norm.sf(x.a + np.arange(4), x.b, x.c) / ss.norm.sf(x.a, x.b, x.c)
), axis=1
)
Out:
0 1 2 3
0 1.0 0.934455 0.869533 0.805636
1 1.0 0.875050 0.760469 0.656303
And you can use join to add this to your original DataFrame:
df.join(out)
Out:
Id a b c 0 1 2 3
0 ijk 4 3.5 12.53 1.0 0.934455 0.869533 0.805636
1 xyz 12 3.0 10.74 1.0 0.875050 0.760469 0.656303