Design and analyze a linear time algorithm to determine whether there exists an element in a list of n elements which repeats itself at least n/10 times in the list.
How can I do this? I'm posting my own idea as an answer.
I assume the elements are comparable.
Perform a selection algorithm for the: n/10th, 2n/10th, ..., 9n/10th, 10(n/10)th smallest elements1
These are your candidates. Check the #occurrences for each of them, and if one of them repeats at least n/10 times the answer is true
. Otherwise, it is false
.
Proof:
If an element appears at least n/10 times, then it must "intersect" with k*n/10
for some k
(in a sorted list)2. Thus, this element will be chosen as a "candidate", and you will later discover (by counting) exactly how many times it appears, and will return true
.
If no element is repeated n/10
times, the algorithm will trivially return false
in the last step of checking each candidate.
Complexity:
Each selection algorithm is O(n)
, and it is done 10 times. Also for each candidate requires linear scan on the list, which is also O(n)
totaling in O(n)
overall—but with terrible constants.
Explanations:
(1) Selection algorithm will find the element which would be in the index n/10, 2n/10,...9n/10 in a sorted list, and the algorithm itself is only O(n)
(2) Let's look at the example of [1,2,..,100,100,..,100]
(11 times 100).
Note that the list is sorted and the element 100 appears in: list[9*n/10]
(index 9*n/10
). The idea of the selection algorithm is, that—even if you shuffle the list—select(list,9*n/10)
will always return the same element—in this case 100—since it is the 9n/10
th element in the sorted list (this is what the algorithm does).
Now, you can see that for each element (let it be e
) that repeats n/10
times, there is some index i
such that in the sorted version of the list, all elements in indices i,i+1,...,i+n/10
will be e
. One of these indices must be one of k*n/10
for some k
(convince yourself why!). Thus, the selection algorithm on k*n/10
will yield e
.