Search code examples
matlabprobabilitystochastic-process

How to choose one element based on its probability?


Let's say I have a vector containing N elements, each being its probability. For example, v = [0.01 0.01 0.09 0.82 0.07]

So I want a function f(v) that returns 4 at 82% of the time, 3 at 9% of the time etc.

The input vector v is always normalized so that sum(v) = 1, this can be a simplification.

How can I implement this probabilistic function in MATLAB? Or maybe there is a built-in function for this?


Solution

  • If you do NOT have the statistics toolbox (otherwise see Stewie's answer) derive the empirical cdf and then use inverse sampling:

    % Numbers of draws
    N = 1e3; 
    % Sample a uniform in (0, 1)
    x = rand(N,1);
    
    % Empirical cdf
    ecdf = cumsum([0, v]);
    
    % Inverse sampling/binning 
    [counts, bin] = histc(x,ecdf);
    

    where bin maps directly to v. So, if you have a generic set of values y with probabilitites v, you should take:

    out = y(bin);
    

    Note, as the number of draws N increases, we get a better approximation of the probabilities:

    counts(end) = []; % Discard last bucket x==1
    counts./sum(counts)
    

    The function can be:

    function [x, counts] = mysample(prob, val, N)
    if nargin < 2 || isempty(val), val = 1:numel(prob); end
    if nargin < 3 || isempty(N),   N   = 1;             end 
    
    [counts, bin] = histc(rand(N,1), cumsum([0, prob(:)']));
    x = val(bin);
    
    end