Search code examples
pythonprobabilitygaussian

Probablity distribution of normal random variable squared - theory vs. simulation in Python


I can't seem to wrap my head around why the theoretical and simulated results are so different for the probablity distribution of a normal random variable squared. (e.g. the power of a Gaussian noise voltage signal) I suspect I'm doing something wrong and wanted to ask, if anyone could help with this.

Here is the code explaining what I'm trying to do:

import numpy as np
from scipy.integrate import quad, simps
from matplotlib import pyplot as plt

def PDF(x, sigma=1, mu=0):  # Gaussian normal distribution PDF
    return 1/(np.sqrt(2*np.pi*sigma))*np.exp(-1/(2*sigma**2)*(x-mu)**2)

def PDFu(u, u_rms=1, u_mean=0):
    return PDF(u, sigma=u_rms, mu=u_mean)

def PDFP(P):
    return 2*PDFu(np.sqrt(P))  # substitute the input variable with the 'scaled' one

def probDensity(x, nbins):  # calculate the probability density based on the input samples
    distr, bins = np.histogram(x, nbins)  # similar to plt.hist(density=True)
    binWidth = bins[1]-bins[0]
    binCenters = bins[:-1]+binWidth/2
    return distr/len(x)/binWidth, binCenters

npoints = 100000
rms = 1
u = np.random.normal(0, rms, npoints)  # samples with Gaussian normal distribution
P = u**2  # square of the samples with Gaussian normal distribution - should follow chi-squared distribution?

nbins = 500
u_distr, u_bins = probDensity(u, nbins)  # calculate PDF based on the samples
print('U_distr integral = ', simps(u_distr,u_bins))  # integrate the calculated PDF, should be 1
plt.plot(u_bins, u_distr)
us = np.linspace(-10, 10, 500)
PDFu_u = PDFu(us)  # calculate the theoretical PDF
print('PDFu_u integral = ', quad(PDFu, -np.Inf, np.Inf))  # integral of the theoretical PDF, should be 1
plt.plot(us, PDFu_u)

nbins = 1000
P_distr, P_bins = probDensity(P, nbins)  # calculate PDF based on the samples
print('P_distr integral = ', simps(P_distr, P_bins))  # integrate the calculated PDF, should be 1
plt.plot(P_bins, P_distr)
Ps = np.linspace(0, 8, npoints)
PDFP_P = PDFP(Ps)   # calculate the theoretical PDF
plt.plot(Ps, PDFP_P)
print('PDFP_P integral = ', quad(PDFP, 0, np.Inf))  # integral of the theoretical PDF, should be 1

plt.show()

The theroetical and the simulated probablity distribution of the normal random variable (u) seem to match nicely, I use this as a sanity check. But the difference is substantial in case of the squared variable and I can't understand why and how to get them to match. Btw, I tried various plausible scaling factors for the theoretical distribution (e.g. 0.5, 2, sqrt(2)), but it did not work and I don't see why I would even need it. Shouldn't it work with just substituting 'P' with 'u' according to the formula u=sqrt(P*R) [R=1] and using the normal distribution of 'u' to calculate the PDF value for certain 'P's? I trust the simulated distribution a little more and I am wondering how the theoretical one should be properly calculated. Why doesn't the substituition method work?

Thank you for the help in advance!


Solution

  • Your theoretical density for the square of a Gaussian is wrong. Here is the calc. If X is Gaussian then for the CDF $F$ of the squared variable $Y=X^2$ we have

    $$ F(x) = P(Y<x) = P(X^2 <x) = P(-\sqrt{x} < X < \sqrt{x}) = \Phi(\sqrt{x}) - \Phi(-\sqrt{x}) $$

    where $\Phi$ is the Gaussian CDF

    so for the PDF $f(x)$ of $Y$ we differentiate that and we get

    $$ f(x) = F'(x) = (1/(2\sqrt{x})) \Psi'(\sqrt{x}) + (1/(2\sqrt{x})) \Psi'(-\sqrt{x}) = (1/(2\sqrt{x})) (\psi(\sqrt{x}) + \psi(-\sqrt{x}) $$

    where $\psi$ is the Gaussian PDF

    so at the very least you are missing the term $(1/(2\sqrt{x}))$

    Here is an image of the formulas if it helps latex