Search code examples
mathstatisticsprobabilitybit

Statistical probability of N contiguous true-bits in a sequence of bits?


Let's assume I have an N-bit stream of generated bits. (In my case 64kilobits.)

Whats the probability of finding a sequence of X "all true" bits, contained within a stream of N bits. Where X = (2 to 16), and N = (16 to 1000000), and X < N.

For example: If N=16 and X=5, whats the likelyhood of finding 11111 within a 16-bit number.

Like this pseudo-code:

int N = 1<<16; // (64KB)
int X = 5;
int Count = 0;
for (int i = 0; i < N; i++) {
    int ThisCount = ContiguousBitsDiscovered(i, X);
    Count += ThisCount;
}
return Count;

That is, if we ran an integer in a loop from 0 to 64K-1... how many times would 11111 appear within those numbers.

Extra rule: 1111110000000000 doesn't count, because it has 6 true values in a row, not 5. So:

1111110000000000 = 0x // because its 6 contiguous true bits, not 5.
1111100000000000 = 1x
0111110000000000 = 1x
0011111000000000 = 1x
1111101111100000 = 2x

I'm trying to do some work involving physically-based random-number generation, and detecting "how random" the numbers are. Thats what this is for.

...

This would be easy to solve if N were less than 32 or so, I could just "run a loop" from 0 to 4GB, then count how many contiguous bits were detected once the loop was completed. Then I could store the number and use it later.

Considering that X ranges from 2 to 16, I'd literally only need to store 15 numbers, each less than 32 bits! (if N=32)!

BUT in my case N = 65,536. So I'd need to run a loop, for 2^65,536 iterations. Basically impossible :)

No way to "experimentally calculate the values for a given X, if N = 65,536". So I need maths, basically.


Solution

  • Fix X and N, obiously with X < N. You have 2^N possible values of combinations of 0 and 1 in your bit number, and you have N-X +1 possible sequences of 1*X (in this part I'm only looking for 1's together) contained in you bit number. Consider for example N = 5 and X = 2, this is a possible valid bit number 01011, so fixed the last two characteres (the last two 1's) you have 2^2 possible combinations for that 1*Xsequence. Then you have two cases:

    Border case: Your 1*X is in the border, then you have (2^(N -X -1))*2 possible combinations

    Inner case: You have (2^(N -X -2))*(N-X-1) possible combinations.

    So, the probability is (border + inner )/2^N

    Examples:

    • 1)N = 3, X =2, then the proability is 2/2^3

    • 2) N = 4, X = 2, then the probaility is 5/16