Linear congruential generator is a good algorithm. But is there any faster algorithm?
As I recall, 8051 has an 8x8=16-bit multiplier, so it might be feasible to implement a lag-n multiply-with-carry generator. The advantage of this technique is that if you can find a suitable safe prime, you can get a very long period with very little arithmetic.
Unfortunately, there don't seem to be an awful lot of safe-prime options with a multiplier that is only eight bits, and I fear the short multiplier may result in some weaknesses that don't normally show up with MWC.
I just knocked this together, and while I'm not 100% certain it implements MWC correctly, it actually passes a surprising number of dieharder tests:
#define STATE_BYTES 7
#define MULT 0x13B /* for STATE_BYTES==6 only */
#define MULT_LO (MULT & 255)
#define MULT_HI (MULT & 256)
uint8_t rand8(void)
{
static uint8_t state[STATE_BYTES] =
{ 0x87, 0xdd, 0xdc, 0x10, 0x35, 0xbc, 0x5c };
static uint16_t c = 0x42;
static int i = 0;
uint16_t t;
uint8_t x;
x = state[i];
t = (uint16_t)x * MULT_LO + c;
c = t >> 8;
#if MULT_HI
c += x;
#endif
x = t & 255;
state[i] = x;
if (++i >= sizeof(state))
i = 0;
return x;
}
As you can see, the multiplier is actually nine bits, but we use a shift-and-add to implement that last bit which the hardware multiplier can't manage.
On further testing, I've found a suitable safe prime of 89 bits which does pass almost all of dieharder. Change these lines:
#define STATE_BYTES 10
#define MULT 0x153 /* for STATE_BYTES==10 only */
and I used this seed for testing:
static uint8_t state[STATE_BYTES] =
{ 0x87, 0xdd, 0xdc, 0x10, 0x35, 0xbc, 0x5c, 0xb6, 0xca, 0x0a, };
static uint16_t c = 0x42;
The seed is just some bits from /dev/random, and you're free to choose your own. However, increasing the state size is basically cheating, because it allows the quality of the seed to play a greater role in the success or failure in randomness tests. It may be that bad seeds could result in not-very-random results.