Search code examples
language-agnosticdata-structuresrepeatinfinite-sequence

Detecting repetition with infinite input


What is the most optimal way to find repetition in a infinite sequence of integers?

i.e. if in the infinite sequence the number '5' appears twice then we will return 'false' the first time and 'true' the second time.

In the end what we need is a function that returns 'true' if the integer appeared before and 'false' if the function received the integer the first time.

If there are two solutions, one is space-wise and the second is time-wise, then mention both. I will write my solution in the answers, but I don't think it is the optimal one.

edit: Please don't assume the trivial cases (i.e. no repetitions, a constantly rising sequence). What interests me is how to reduce the space complexity of the non-trivial case (random numbers with repetitions).


Solution

  • I'd use the following approach:

    Use a hash table as your datastructure. For every number read, store it in your datastructure. If it's already stored before you found a repetition.

    If n is the number of elements in the sequence from start to the repetition, then this only requires O(n) time and space. Time complexity is optimal, as you need to at least read the input sequence's elements up to the repetition point.

    How long of a sequence are we talking (before the repetition occurs)? Is a repetition even guaranteed at all? For extreme cases the space complexity might become problematic. But to improve it you will probably need to know more structural information on your sequence.

    Update: If the sequence is as you say very long with seldom repetitions and you have to cut down on the space requirement, then you might (given sufficient structural information on the sequence) be able to cut down the space cost.

    As an example: let's say you know that your infinite sequence has a general tendency to return numbers that fit within the current range of witnessed min-max numbers. Then you will eventually have whole intervals that have already been contained in the sequence. In that case you can save space by storing such intervals instead of all the elements contained within it.