Search code examples
algorithmperformancetimedistinct-values

Algorithm for the largest subarray of distinct values in linear time


I'm trying to come up with a fast algorithm for, given any array of length n, obtaining the largest subarray of distinct values.

For example, the largest subarray of distinct values of

[1, 4, 3, 2, 4, 2, 8, 1, 9]

would be

[4, 2, 8, 1, 9]

This is my current solution, I think it runs in O(n^2). This is because check_dups runs in linear time, and it is called every time j or i increments.

arr = [0,...,n]
i = 0
j = 1
i_best = i
j_best = j
while i < n-1 and j < n:
    if check_dups(arr, i j): //determines if there's duplicates in the subarray i,j in linear time
        i += 1
    else:
        if j - i > j_best - i_best:
            i_best = i
            j_best = j
        j += 1
return subarray(arr, i_best, j_best)

Does anyone have a better solution, in linear time?

Please note this is pseudocode and I'm not looking for an answer that relies on specific existing functions of a defined language (such as arr.contains()). Thanks!


Solution

  • Consider the problem of finding the largest distinct-valued subarray ending at a particular index j. Conceptually this is straightforward: starting at arr[j], you go backwards and include all elements until you find a duplicate.

    Let's use this intuition to solve this problem for all j from 0 up to length(arr). We need to know, at any point in the iteration, how far back we can go before we find a duplicate. That is, we need to know the least i such that subarray(arr, i, j) contains distinct values. (I'm assuming subarray treats the indices as inclusive.)

    If we knew i at some point in the iteration (say, when j = k), can we quickly update i when j = k+1? Indeed, if we knew when was the last occurrence of arr[k+1], then we can update i := max(i, lastOccurrence(arr[k+1]) + 1). We can compute lastOccurrence in O(1) time with a HashMap.

    Pseudocode:

    arr = ... (from input)
    map = empty HashMap
    i = 0
    i_best = 0
    j_best = 0
    for j from 0 to length(arr) - 1 inclusive:
        if map contains-key arr[j]:
            i = max(i, map[arr[j]] + 1)
        map[arr[j]] = j
        if j - i > j_best - i_best:
            i_best = i
            j_best = j
    return subarray(arr, i_best, j_best)