Search code examples
c++prng

What is a possible way to bias a random number generator?


I built a word generator, it picks a length and then randomly picks letters of the alphabet to make up words.

The program works but 99% of the output is rubbish as it is not observing the constructs of the English language, I am getting as many words with x and z in as I do e.

What are my options for biasing the RNG so it will use common letters more often.

I am using rand() from the stl seeded with the time.


Solution

  • The output will still be rubbish because biasing the random number generator is not enough to construct proper English words. But one approach to biasing the rng is:

    1. Make a histogram of the occurences of letters in a large English text (the corpus). You'll get something like 500 'e', 3 'x', 1 'q', 450 'a', 200 'b' and so on.
    2. Divide an interval into ranges where each letter gets a slice, with the length of the slice being the number of occurences in the interval. a gets [0-450), b [450,650), ..., q [3500,3501).
    3. Generate a random number between 0 and the total length of the interval and check where it lands. Any number within 450-650 gives you a b, but only 3500 gives you a 'q'.