Search code examples
algorithmpiupperbound

Upper bound on 4 digit sequences in pi


If this is not the right SE site for this question, please let me know.

A friend shared this interview question he received over the phone, which I have tried to solve myself. I will paraphrase:

The value of pi up to n digits as a string is given.

How can I find all duplicate 4 digit sequences in this string?

This part seems fairly straight forward. Add 4 character sequences to a hash table, incrementing one character at a time. Check if the current 4 character sequence already exists before insertion into the hash table. If so, then you have found a duplicate. Store this somewhere, and repeat the process. I was told this was more or less correct.

The issue I have is on the second question:

What is the upper bound?

n = 10,000,000 was an example.

My algorithm background is admittedly very rusty. My first thought is that the upper bound must be related to n somehow, but I was told it is not.

How do I calculate this?

EDIT:

I would also be open to a solution that disregards the restraint that the upper bound is not related to n. Either is acceptable.


Solution

  • There are only 10,000 possible sequences of four digits (0000 to 9999), so at some point you will have found that every sequence has been duplicated, and there's no need to process further digits.

    If you assume that pi is a perfectly uniform random number generator, then each new digit that's processes results in a new sequence, and after about 20,000 digits, you will have found duplicates for all 10,000 sequences. Given that pi is not perfect, you may need significantly more digits before you duplicate all sequences, but 100,000 would be a reasonable guess at the upper bound.

    Also, since there are only 10,000 possibilities, you don't really need a hash table. You can simply use an array of 10000 counters, (int count[10000]), and increment the count for each sequence you find.