I'm trying to come up with a fast algorithm for, given any array of length n, obtaining the largest subarray of distinct values.
For example, the largest subarray of distinct values of
[1, 4, 3, 2, 4, 2, 8, 1, 9]
would be
[4, 2, 8, 1, 9]
This is my current solution, I think it runs in O(n^2). This is because check_dups runs in linear time, and it is called every time j or i increments.
arr = [0,...,n]
i = 0
j = 1
i_best = i
j_best = j
while i < n-1 and j < n:
if check_dups(arr, i j): //determines if there's duplicates in the subarray i,j in linear time
i += 1
else:
if j - i > j_best - i_best:
i_best = i
j_best = j
j += 1
return subarray(arr, i_best, j_best)
Does anyone have a better solution, in linear time?
Please note this is pseudocode and I'm not looking for an answer that relies on specific existing functions of a defined language (such as arr.contains()). Thanks!
Consider the problem of finding the largest distinct-valued subarray ending at a particular index j
. Conceptually this is straightforward: starting at arr[j]
, you go backwards and include all elements until you find a duplicate.
Let's use this intuition to solve this problem for all j
from 0
up to length(arr)
. We need to know, at any point in the iteration, how far back we can go before we find a duplicate. That is, we need to know the least i
such that subarray(arr, i, j)
contains distinct values. (I'm assuming subarray
treats the indices as inclusive.)
If we knew i
at some point in the iteration (say, when j = k
), can we quickly update i
when j = k+1
? Indeed, if we knew when was the last occurrence of arr[k+1]
, then we can update i := max(i, lastOccurrence(arr[k+1]) + 1)
. We can compute lastOccurrence
in O(1) time with a HashMap.
Pseudocode:
arr = ... (from input)
map = empty HashMap
i = 0
i_best = 0
j_best = 0
for j from 0 to length(arr) - 1 inclusive:
if map contains-key arr[j]:
i = max(i, map[arr[j]] + 1)
map[arr[j]] = j
if j - i > j_best - i_best:
i_best = i
j_best = j
return subarray(arr, i_best, j_best)