Search code examples
matlabstatisticsprobabilitykernel-densityprobability-density

Estimate pdf of a vector using Gaussian Kernel


I am using Gaussian kernel to estimate a pdf of a data based on the equation enter image description here where K(.) is Gaussian kernel, data is a given vector. z is bin from 1 to 256. size of bin is 1.

I implemented by matlab code. However, the result show the amplitude of my pdf estimation (blue color) is not similar with real pdf of data. Could you see my code and give me some comment about my code?

MATLAB CODE

function pdf_est=KDE()
close all;
%%Random values of 20 pixels, range=[1 256]
data=randi([1 256],1,20);

%% Estimate histogram%%%%% 
pdf_est=zeros(1,256);
z=256;

for i=1:z
    for j=1:length(data)
        pdf_est(i)=pdf_est(i)+Gaussian(i-data(j));
    end
end
%% Plot real histogram 1 to 256; binsize=1;
hold on
plot(imhist(uint8(data))./length(data),'r');
%% Plot histogram estimation
plot(pdf_est./length(data),'b');
hold off
function K=Gaussian(x)
   sigma=1;
   K=1./(sqrt(2*pi)*sigma)*exp(-x^2./(2*sigma^2));

RESULT BLUE is my result and RED is real pdf enter image description here


Solution

  • You have two issues:

    1. A 1-unit displacement between blue and red plots.
    2. The blue spikes are wider and less tall than the red ones.

    How to solve each issue:

    1. This is caused by a possible confusion between the data range 0,...,255 and the indexing interval 1,...,256. Since your data represents an 8-bit image, values should be 0,...,255 (not 1,...,256). Your plotted horizontal axis should then be 0,...,255. Same goes for the i variable in the for line. And then, since Matlab indexing starts at 1, you should use i+1 when indexing pdf_est.

    2. This is normal behaviour. You are assuming unit variance in your kernel. To see taller blue spikes you could reduce sigma to make the kernel narrower and taller. But you will never get the exact same height as your data (the necessary sigma would depend on your data).

      Actually, you have a tradeoff between height and width, controlled by sigma. But the important thing is that the area remains the same for any sigma. So I suggest plotting the CDF (area) instead of the pdf (area densisty). To do that, plot the accumulated histogram and pdf (using cumsum).

    Code modified according to 1:

    function pdf_est=KDE()
    close all;
    %%Random values of 20 pixels, range=[1 256]
    data=randi([1 256],1,20)-1; %// changed: "-1"
    
    %% Estimate histogram%%%%% 
    pdf_est=zeros(1,256);
    z=256;
    
    for i=0:z-1 %// changed_ subtracted 1 
        for j=1:length(data)
            pdf_est(i+1)=pdf_est(i+1)+Gaussian(i-data(j)); %// changed: "+1" (twice)
        end
    end
    %% Plot real histogram 1 to 256; binsize=1;
    hold on
    plot(0:255, imhist(uint8(data))./length(data),'r'); %// changed: explicit x axis
    %% Plot histogram estimation
    plot(0:255, pdf_est./length(data),'b'); %// changed: explicit x axis
    hold off
    function K=Gaussian(x)
       sigma=1; %// change? Set as desired
       K=1./(sqrt(2*pi)*sigma)*exp(-x^2./(2*sigma^2));
    

    enter image description here

    Code modified according to 1 and 2:

    function pdf_est=KDE()
    close all;
    %%Random values of 20 pixels, range=[1 256]
    data=randi([1 256],1,20)-1; %// changed: "-1"
    
    %% Estimate histogram%%%%% 
    pdf_est=zeros(1,256);
    z=256;
    
    for i=0:z-1 %// changed: subtracted 1 
        for j=1:length(data)
            pdf_est(i+1)=pdf_est(i+1)+Gaussian(i-data(j)); %// changed: "+1" (twice)
        end
    end
    %% Plot real histogram 1 to 256; binsize=1;
    hold on
    plot(0:255, cumsum(imhist(uint8(data))./length(data)),'r'); %// changed: explicit x axis
                                                                %// changed: cumsum
    %% Plot histogram estimation
    plot(0:255, cumsum(pdf_est./length(data)),'b'); %// changed: explicit x axis
                                                    %// changed: cumsum
    hold off
    function K=Gaussian(x)
       sigma=1; %// change? Set as desired
       K=1./(sqrt(2*pi)*sigma)*exp(-x^2./(2*sigma^2));
    

    enter image description here