Search code examples
matlabgaussiannoiseprobability-density

4 different Matlab functions to estimate the PDF of data give 4 different results


I might be stuck in some details, but I am stuck in at seemingly operation. I have a set of data gaussian_noise (representing white noise with mean = mu, and std_dev = sigma) and I want to plot the PDF of this data. The values of gaussian_noise are in the range of [-0.0155; 0.0155]. I used several methods which all give me different plots. All of them not corresponding to the theoretical PDF of a gaussian distribution with mean = mu and st_dev = sigma. Do you know what I am missing? Normalization? But ksdensity apparently returns a normalized result. Here are some examples of what I am doing and the plots:

[pdf_empir, sample_data] = ksdensity(gaussian_noise);
figure; plot(sample_data, pdf_empir);

Plot of ksdensity

When I change the bandwidth of ksdensity I get a smooth, continuous curve but agian with different values

Plot of ksdensity with bandwidth 0.002

By using histogramm

figure; histogram(gaussian_noise,'Normalization','pdf');

Plot of histogramm's output

Of course I can smooth the curve to obtain a continuous PDF but the point that bothers me is the fact that the function values (y-axis) are all different.

By using histfit

figure; histfit(gaussian_noise)

Plot of histfit's PDF

The theoretical PDF of a gaussian distribution with mean = 0 and sigma = 0.0027, evaluated for the data range x_range = [-0.0155; 0.0155] and for 10000 instances looks like (obtained with normpdf)

Theoretical Gaussian PDF

The peak is at 146.9 which corresponds to the theoretical 1/(sigma*sqrt(2*pi)).

Apparently, I am doing something wrong by applying above methods. I guess, I need to do some normalization. But when I divide by length(gaussian_noise), since it is a constant, I again obtain different values with the different methods.

Do you have any idea what I am doing wrong? Appreciate your ideas and comments.


Solution

  • I think the fundamental thing here is that it's not the peaks of the distributions which need to agree, but the area under them. Also if the distribution is normalised then the area should be equal to 1.

    In each of the plotting methods above the "bin width" are of different sizes which cause the values in each bin to change in order to preserve the area under the curve.