Search code examples
matlabprobabilityconvergenceprobability-distribution

PDF and CDF for Biased die using Matlab with Central Limit Theorem


I am trying to plot the PDF and CDF for a biased die roll for 10^4 samples using Central Limit Theorem.(CLT)

The die is biased or unfair where even sides are twice as likely as odd sides. Here is the diefaces = [1,3,2,4,6,2].What can I use in Matlab to find probability of an odd number in this case with CLT where Sn = X1 + X2 + ... + Xn , n = 40.

Here is what I have tried so far. The part that I am struggling with is passing the samples which in this case is 10^4 and n=40. Appreciate any help....

clear
% Roll the dice "numberOfRolls" times
numberOfRolls = 10000; % Number of times you roll the dice.
% Biased die with even sides twice as likely as a odd number
diefaces = [1,3,2,4,6,2];
n = 1; % Number of dice.
maxFaceValue = 6;
% Pick a random number from diefaces along with the number of rolls which
% is not working :(
x = diefaces(randperm(numel(diefaces),1),10000)


    S1 = cumsum(x)
    hist1= histogram(S1(1,:),'Normalization','pdf','EdgeColor', 'blue', 'FaceColor',  'blue')

Solution

  • Define your die and obtain a valid probability mass function (PMF) for your custom die. You can verify the PMF by ensuring sum(Prob) equals 1. Note that a fair die is obtained by setting RelChance to [1 1 1 1 1 1].

    The die face probabilities below are [1/9 2/9 1/9 2/9 1/9 2/9].

    Die = [1 2 3 4 5 6];
    RelChance = [1 2 1 2 1 2];            % Relative Chance
    Prob = RelChance./sum(RelChance);     % probability mass function for die
    

    You can use datasample() to simulate the outcome of rolling the die (requires Statistics toolbox). This is easy enough to hard code if absolutely necessary through several methods.

    The code below reads sample NumRolls many times from Die with the probability Prob(ii) representing the probability of Die(ii).

    % MATLAB R2019a
    NumRolls = 13;                        % Number of rolls 
    Rolls = datasample(Die,NumRolls,'Weights',Prob);
    

    Now, to use this to accomplish the stated goal. Similar to this post, create an array X that has the first row as realizations of X1, the second row realizations of X2, and so on. This doesn't have to be fancy.

    And again, use cumsum() to get the cumulative sum along the columns. This means the first row is realizations from S1=X1, second row is realizations from S2=X1+X2, and the 40th row is empirical samples from S40 = X1 + X2 + ... + X40.

    n_max = 40;
    NumRolls = 10000;
    X = zeros(n_max,NumRolls);
    for n = 1:n_max
        X(n,:) = datasample(Die,NumRolls,'Weights',Prob);
    end
    Sn = cumsum(X);
    

    How to Plot? At this point, since our variable names match the process, the remaining steps (plotting) are identical to this post but with a few minor modifications. Since this is discrete (not continuous) data, I generated the plot below using the 'Normalization','probability' option for histogram(). References to a probability density function (PDF) have been relabeled probability mass function (PMF) accordingly.

    Images showing the discrete analog to the Central Limit Theorem's convergence.


    Continuous version of the Central Limit Theorem is posted here.