Search code examples
artificial-intelligencebayesian-networksreasoning

Why random numbers in Bayes' Net Sampling


I am trying to wrap my head around sampling in Bayesian Networks (simple unoptimized prior sampling for now). From what I understand, the idea is to produce a limited number of samples and see, how they propagate through the network. I do not understand, why a random number generator is necessary for this process.

Suppose you have a random variable node with a Conditional Probability Distribution (CPD) as follows:

| Color | P(Color) |
|------------------|
| Red   | 0.1      |
| Green | 0.2      |
| Blue  | 0.7      |

Then the introductions I could find say, that for each sample you want to take, you should call a random()-function yielding e.g. something in [0.0, 1.0), and then check, into which sub-interval Red:[0.0, 0.1), Green:[0.1, 0.3) or Blue: [0.3, 1.0) it falls.

My question is, why even call a random number generator? After all, you have your probabilities right in front of you. If you decide ahead of time, that you want to create an amount n of samples, couldn't you just have 0.1*n samples be Red, 0.2*n samples be Green and 0.7*n samples be Blue? For a child node with its CPD, you could then split up all red, green and blue samples according to their respective conditional probabilities, again without using a random number generator.

This would still be an approximation, since you're still not reasoning over the complete Joint Probability Distribution. And in the limit this should still approach the correct conditional probabilities for n --> infinity, shouldn't it?


Solution

  • The following are comments from Hernan C. Vazquez, taken from an exchange in the context of his first answer. Within this exchange, his comments answered my original question, so I thought I would post them as an answer here.

    You need random because If a sample isn't randomly selected, it will probably be biased in some way. You need to ensure that the data is representative of the population and the way to do this is through a random number generator.

    In other words, you can have heads or tails with 0.5 each one. If I want to take 2 samples (n = 2), and I use 0.5 * n then I have that for each time it comes out head or tail the following is the opposite, P (head | tail) = 1 and viceversa. And that's not a representative sample since P (head | tail) = 0.5. You are changing the rules of the game.