I have an empirical set of data (Hypothetically x=normrnd(10,3,1000,1);
) which has a cumulative distribution function as follows:
I also have a set of data x1=[11,11.1,10.1]
. I'd like to find the probability of finding the values of x1
if they came from the distribution x
. If it was a continuous known function I could evaluate it exactly but I'd like to do it from the data I have. Any thoughts?
By hand I would find the value on the x axis and trace up to the line and across to the F(x) axis (see figure 1).
EDIT:
size(x1)
10,0000
So I now have found out how to get the data that plots F(x)
handles=cdfplot(X);
xdata=get(handles,'XData');
ydata=get(handles,'YData');
I think now it's a case of finding the location of x
in an interval in xdata
and subsequently the location in ydata.
e.g.
for i=1:length(x)
for j=1:length(xdata)
if x(i,1)<=xdata(jj,1)
X(i)=xdata(jj,1);
end
end
end
Y=ydata(X);
Is this the most elegant way?
There is a much more elegant way to do that using bsxfun
. Also, you can just calculate empirical CDF using ecdf
instead of cdfplot
(unless you actually need the plot):
x = normrnd(10,3,1000,1);
[f_data, x_data] = ecdf(x);
x1 = [11, 11.1, 10.1];
idx = sum(bsxfun(@le, x_data(:)', x1(:)), 2);
y1 = f_data(idx);