Search code examples

How do you calculate the cdf of a linear transformation of the normal distribution in python?

I have a pdf which is a linear transformation of the normal distribution:

T = 0.5A + 0.5B

Mean_A = 276

Standard Deviation_A = 6.5

Mean_B = 293

Standard Deviation_A = 6

How do I calculate the probability that T is between 281 and 291 in Python?

I have tried the following code:

mu1 = 276

sigma1 = 6.5

mu2 = 293

sigma2 = 6

normalized = 0.5 * scipy.stats.norm.pdf(x, loc = mu1, scale = sigma1) + 0.5 * scipy.stats.norm.pdf(x, loc = mu2, scale = sigma2)

print(normalized.cdf(291) - normalized.cdf(281))

But this came up with an error.

I've also tried to calculate the CDF of T ~ N(284.5, 19.5625) and

print(norm.cdf(291 - 284.5/4.422952)), etc but this came up with an incorrect answer.

Any help would be much appreciated!


  • Your comment would suggest that you're assuming that the variables are independent, since in that case, the mean and the variance of the sum are given are as you've given.

    Then, you can define the sum through

    normalized = scipy.stats.norm(0.5*mu1 + 0.5*mu2, np.sqrt((0.5*sigma1)**2 + (0.5*sigma2)**2))

    and in particular, get your desired probably using cdf as you did:

    In [27]: normalized.cdf(291) - normalized.cdf(281)                                              
    Out[27]: 0.7147892127602181

    To validate that this result matches expectation, we can run a quick simulation:

    In [31]: N = 10**7                                                                               
    In [32]: rvs = 0.5*np.random.normal(mu1, sigma1, size=N) + 0.5*np.random.normal(mu2, sigma2, size=N)     
    In [33]: ((rvs > 281) & (rvs < 291)).mean()                                                              
    Out[33]: 0.7148597

    Indeed, this is a reasonable approximation to the exact result above.

    Edit: As per the comment given to this answer, OP is actually interested in the random variable whose PDF is


    Notably, this is not a linear combination of normally distributed variables (and it's not itself a normally distributed variable for that matter), so if it's phrased as such in whichever exercise you're given, then they've worded it incorrectly.

    This case is even simpler though: Integrating the PDF from 281 to 291 can be done by integrating each summand, which in turn is nothing but the PDF of a normal distribution, so that you can proceed as above:

    In [43]: n1 = scipy.stats.norm(mu1, sigma1)                                                                       
    In [44]: n2 = scipy.stats.norm(mu2, sigma2)                                                                       
    In [45]: .5*(n1.cdf(291) - n1.cdf(281) + n2.cdf(291) - n2.cdf(281))                                      
    Out[45]: 0.2785306219161424