I am not very proficient in Statistics, I am trying to learn. So please bear with me. I saw this question in Quora - Which basically states the following -
A fair dice is rolled if the result is an odd number then a fair coin is tossed 3 times. Otherwise, if the result is even number then a fair coin will be tossed 2 times. In both cases, # of heads is counted. What's the variance of # heads obtained?
I wanted to solve it using Python and tf-probability. Here is what I did -
import tensorflow as tf
import tensorflow_probability as tfp
import numpy as np
tf.enable_eager_execution()
probs = [1/6.] * 6
dices = tfp.distributions.Multinomial(total_count=1000, probs=probs)
n = dices.sample()
HEAD = 1
TAIL = 0
l = list(n.numpy())
heads_even = []
heads_odd = []
for i, nums in enumerate(l):
mul_by = 3 if (i + 1) % 2 != 0 else 2
tosses = tfp.distributions.Bernoulli(probs=0.5)
coin_flip_data = tosses.sample(nums * mul_by)
l2 = coin_flip_data.numpy()
unique, counts = np.unique(l2, return_counts=True)
head_tails = dict(zip(unique, counts))
if (i + 1) % 2 != 0:
heads_odd.append(head_tails[HEAD])
else:
heads_even.append(head_tails[HEAD])
total_heads = heads_odd + heads_even
final_nd_arr = np.array(total_heads)
print(final_nd_arr.var())
However, the final_nd_arr.var()
is of course nowhere near to the actual answer (it is 2089.805555555556) , 0.68 (As people have mentioned in the Quora answer).
I am unable to find out what I am doing wrong. How can I rectify my mistake?
Any pointer will be helpful. Thanks a lot in advance.
--------- EDIT
To give more data,
dices.sample() => array([169., 173., 149., 171., 175., 163.], dtype=float32)
heads_odd => [266, 210, 259]
heads_even => [176, 167, 145]
total_heads => [266, 210, 259, 176, 167, 145]
You are computing the variance over the wrong distribution. The variance we are looking for applies to the experiment where you would roll the dice over and over again, each time count the number of heads, and compute the variance over the number of heads. You are doing this in your code, but your are summing the total number of heads over all the dice rolls, and then taking the variance of these sums for each possible outcome of the dice.
This will give the correct result. I added some comments that hopefully clarify it:
import tensorflow as tf
import tensorflow_probability as tfp
import numpy as np
tf.enable_eager_execution()
# Simulate the outcome of 1000 dice rolls
probs = [1/6.] * 6
dices = tfp.distributions.Multinomial(total_count=1000, probs=probs)
n = dices.sample()
l = list(n.numpy().astype(int))
L = []
# Loop over 6 possible dice outcomes
for i in range(len(l)):
# Loop over the rolls for this dice outcome
for _ in range(l[i]):
# For each of the dice rolls,
# Flip a coin 2 or three times
num_tosses = 3 if (i + 1) % 2 != 0 else 2
tosses = tfp.distributions.Bernoulli(probs=0.5)
coin_flip_data = tosses.sample(num_tosses)
# And count the number of heads
num_heads = np.sum(coin_flip_data.numpy())
L += [num_heads]
np.var(L)
> 0.668999