Let's say I have a vector containing N elements, each being its probability. For example, v = [0.01 0.01 0.09 0.82 0.07]
So I want a function f(v)
that returns 4 at 82% of the time, 3 at 9% of the time etc.
The input vector v is always normalized so that sum(v) = 1
, this can be a simplification.
How can I implement this probabilistic function in MATLAB? Or maybe there is a built-in function for this?
If you do NOT have the statistics toolbox (otherwise see Stewie's answer) derive the empirical cdf and then use inverse sampling:
% Numbers of draws
N = 1e3;
% Sample a uniform in (0, 1)
x = rand(N,1);
% Empirical cdf
ecdf = cumsum([0, v]);
% Inverse sampling/binning
[counts, bin] = histc(x,ecdf);
where bin
maps directly to v
. So, if you have a generic set of values y
with probabilitites v
, you should take:
out = y(bin);
Note, as the number of draws N
increases, we get a better approximation of the probabilities:
counts(end) = []; % Discard last bucket x==1
counts./sum(counts)
The function can be:
function [x, counts] = mysample(prob, val, N)
if nargin < 2 || isempty(val), val = 1:numel(prob); end
if nargin < 3 || isempty(N), N = 1; end
[counts, bin] = histc(rand(N,1), cumsum([0, prob(:)']));
x = val(bin);
end