If different classes of an application need to extract one or more random numbers, where should a random number generator be initialized in order to produce good random sequences?
In particular, I need to build some decision trees in order to train a random forest. The construction of each decision tree involves the following steps:
The three steps listed above are performed for the construction of each decision tree. The procedure just described provides that the random number generation occurs several times. For example the second step should ensure that each decision tree is trained with a dataset slightly different from the initial one, so the random number generator should avoid the generation of equal datasets (or in any case the likelihood of this occurring should be very low).
In essence, in this procedure we can identify two sources of randomness:
N
random dataset, each to train a single decision tree;M
random extractions from a given dataset.How many random number generators should I use? Since I have a class that implements the random forest, and another class that implements the decision tree, I thought I'd initialize a random number generator in the first class (the first source of randomness), and another random number generator in the second class (the second source of randomness). Is this correct?
In general, what are the guidelines for choosing the correct number of pseudo-random number generators?
Depends on how repeatable you need the sequence to be. e.g. if you can't guarantee the order that the rand() calls are made in, and need to generate the same sequence each time for testing, then you'd need a separate seed/generator for each of these queues.
If you don't care for repeatability, then just have one generator, one seed, and let it run.