Search code examples
algorithmlinear-search

Design and analyze a linear time algorithm


Design and analyze a linear time algorithm to determine whether there exists an element in a list of n elements which repeats itself at least n/10 times in the list.

How can I do this? I'm posting my own idea as an answer.


Solution

  • I assume the elements are comparable.

    Perform a selection algorithm for the: n/10th, 2n/10th, ..., 9n/10th, 10(n/10)th smallest elements1

    These are your candidates. Check the #occurrences for each of them, and if one of them repeats at least n/10 times the answer is true. Otherwise, it is false.

    Proof:
    If an element appears at least n/10 times, then it must "intersect" with k*n/10 for some k (in a sorted list)2. Thus, this element will be chosen as a "candidate", and you will later discover (by counting) exactly how many times it appears, and will return true.

    If no element is repeated n/10 times, the algorithm will trivially return false in the last step of checking each candidate.

    Complexity:
    Each selection algorithm is O(n), and it is done 10 times. Also for each candidate requires linear scan on the list, which is also O(n) totaling in O(n) overall—but with terrible constants.


    Explanations:

    (1) Selection algorithm will find the element which would be in the index n/10, 2n/10,...9n/10 in a sorted list, and the algorithm itself is only O(n)

    (2) Let's look at the example of [1,2,..,100,100,..,100] (11 times 100).
    Note that the list is sorted and the element 100 appears in: list[9*n/10] (index 9*n/10). The idea of the selection algorithm is, that—even if you shuffle the list—select(list,9*n/10) will always return the same element—in this case 100—since it is the 9n/10th element in the sorted list (this is what the algorithm does).

    Now, you can see that for each element (let it be e) that repeats n/10 times, there is some index i such that in the sorted version of the list, all elements in indices i,i+1,...,i+n/10 will be e. One of these indices must be one of k*n/10 for some k (convince yourself why!). Thus, the selection algorithm on k*n/10 will yield e.