Search code examples
matlabprobabilitycumulative-frequency

How to find if a value lies in between bounds of a histogram?


I have an empirical set of data (Hypothetically x=normrnd(10,3,1000,1);) which has a cumulative distribution function as follows:

enter image description here

I also have a set of data x1=[11,11.1,10.1]. I'd like to find the probability of finding the values of x1 if they came from the distribution x. If it was a continuous known function I could evaluate it exactly but I'd like to do it from the data I have. Any thoughts?

By hand I would find the value on the x axis and trace up to the line and across to the F(x) axis (see figure 1).

EDIT:

size(x1)
10,0000

So I now have found out how to get the data that plots F(x)

handles=cdfplot(X);
xdata=get(handles,'XData');
ydata=get(handles,'YData');

I think now it's a case of finding the location of x in an interval in xdata and subsequently the location in ydata.

e.g.

for i=1:length(x)
    for j=1:length(xdata)
        if x(i,1)<=xdata(jj,1)
            X(i)=xdata(jj,1);
        end
    end
end
Y=ydata(X);

Is this the most elegant way?

Solution

  • There is a much more elegant way to do that using bsxfun. Also, you can just calculate empirical CDF using ecdf instead of cdfplot (unless you actually need the plot):

    x = normrnd(10,3,1000,1);
    [f_data, x_data] = ecdf(x);
    x1 = [11, 11.1, 10.1];
    
    idx = sum(bsxfun(@le, x_data(:)', x1(:)), 2);
    y1 = f_data(idx);