Search code examples
matlabstatisticsprobabilitycdf

defining the X values for a code


I have this task to create a script that acts similarly to normcdf on matlab.

       x=linspace(-5,5,1000); %values for x
       p= 1/sqrt(2*pi) * exp((-x.^2)/2); % THE PDF for the standard normal 
       t=cumtrapz(x,p); % the CDF for the standard normal distribution
       plot(x,t); %shows the graph of the CDF

The problem is when the t values are assigned to 1:1000 instead of -5:5 in increments. I want to know how to assign the correct x values, that is -5:5,1000 to the t values output? such as when I do t(n) I get the same result as normcdf(n).

Just to clarify: the problem is I cannot simply say t(-5) and get result =1 as I would in normcdf(1) because the cumtrapz calculated values are assigned to x=1:1000 instead of -5 to 5.


Solution

  • Updated answer

    Ok, having read your comment; here is how to do what you want:

    x = linspace(-5,5,1000);
    p = 1/sqrt(2*pi) * exp((-x.^2)/2);
    cdf = cumtrapz(x,p);
    
    q = 3; % Query point
    disp(normcdf(q)) % For reference
    [~,I] = min(abs(x-q)); % Find closest index
    disp(cdf(I)) % Show the value
    

    Sadly, there is no matlab syntax which will do this nicely in one line, but if you abstract finding the closest index into a different function, you can do this:

    cdf(findClosest(x,q))
    
    function I = findClosest(x,q)
    if q>max(x) || q<min(x)
        warning('q outside the range of x');
    end
    [~,I] = min(abs(x-q));
    end
    

    Also; if you are certain that the exact value of the query point q exists in x, you can just do

    cdf(x==q);
    

    But beware of floating point errors though. You may think that a certain range outght to contain a certain value, but little did you know it was different by a tiny roundoff erorr. You can see that in action for example here:

    x1 = linspace(0,1,1000); % Range
    x2 = asin(sin(x1)); % Ought to be the same thing
    plot((x1-x2)/eps); grid on; % But they differ by rougly 1 unit of machine precision
    

    Old answer

    As far as I can tell, running your code does reproduce the result of normcdf(x) well... If you want to do exactly what normcdf does them use erfc.

    close all; clear; clc;
    
    x = linspace(-5,5,1000);
    cdf = normcdf(x); % Result of normcdf for comparison
    
    %% 1 Trapezoidal integration of normal pd
    p = 1/sqrt(2*pi) * exp((-x.^2)/2);
    cdf1 = cumtrapz(x,p);
    
    %% 2 But error function IS the integral of the normal pd
    cdf2 = (1+erf(x/sqrt(2)))/2;
    
    %% 3 Or, even better, use the error function complement (works better for large negative x)
    cdf3 = erfc(-x/sqrt(2))/2;
    
    fprintf('1: Mean error = %.2d\n',mean(abs(cdf1-cdf)));
    fprintf('2: Mean error = %.2d\n',mean(abs(cdf2-cdf)));
    fprintf('3: Mean error = %.2d\n',mean(abs(cdf3-cdf)));
    plot(x,cdf1,x,cdf2,x,cdf3,x,cdf,'k--');
    

    This gives me

    1: Mean error = 7.83e-07
    2: Mean error = 1.41e-17
    3: Mean error = 00 <- Because that is literally what normcdf is doing
    

    If your goal is not not to use predefined matlab funcitons, but instead to calculate the result numerically (i.e. calculate the error function) then it's an interesting challange which you can read about for example here or in this stats stackexchange post. Just as an example, the following piece of code calculates the error function by implementing eq. 2 form the first link:

    nerf = @(x,n) (-1)^n*2/sqrt(pi)*x.^(2*n+1)./factorial(n)/(2*n+1);
    
    figure(1); hold on;
    temp = zeros(size(x)); p =[];
    for n = 0:20
        temp = temp + nerf(x/sqrt(2),n);
        if~mod(n,3)
            p(end+1) = plot(x,(1+temp)/2);
        end
    end
    ylim([-1,2]);
    title('\Sigma_{n=0}^{inf}  ( 2/sqrt(pi) ) \times ( (-1)^n x^{2*n+1} ) \div ( n! (2*n+1) )');
    p(end+1) = plot(x,cdf,'k--');
    legend(p,'n = 0','\Sigma_{n} 0->3','\Sigma_{n} 0->6','\Sigma_{n} 0->9',...
        '\Sigma_{n} 0->12','\Sigma_{n} 0->15','\Sigma_{n} 0->18','normcdf(x)',...
        'location','southeast');
    grid on; box on;
    xlabel('x'); ylabel('norm. cdf approximations');
    

    erfApprox