Search code examples
arraysalgorithmdivide-and-conquer

Removing duplicates from an array using divide and conquer in O(nlogn) time


You are given an Array A of n requests for 2010 olympic tickets. The array is ordered by the time of the request so that A(1) is the first to arrive and A(2) is the second to arrive etc. Each request contains a ten digit telephone number. In order to try to be fair the olympic organizers have made a rule that there can only be one request from each telephone number. It has been noticed that array A contains more than one request from some telephone numbers. Write an O(nlogn) time divide and conquer algorithm to remove from A all requests from the same telephone number except the first received. The final output should be array A containing m<=n requests from a unique telephone number. Also the requests in A should remain in the same order as they were before the duplicates were removed.

I get how this can be done if the array was sorted by telephone number however I do not see how it is possible when the array is sorted by request time.


Solution

  • A simple modification to merge sort will do the trick. Merge sort is O(n log n). It's also stable, meaning that items with equal keys will remain in their same relative order. The only difference here is that you want to eliminate duplicates. The only change to the code is in the comparison and copying of items. For example, the inner loop of a standard merge sort looks like this:

    while (leftIndex < leftMax && rightIndex < rightMax)
    {
        if (a[leftIndex] <= a[rightIndex])
        {
            output[outputIndex] = a[leftIndex];
            ++leftIndex;
        }
        else
        {
            output[outputIndex] = a[rightIndex];
            ++rightIndex;
        }
    }
    

    You would modify the code so you don't copy equal items:

    while (leftIndex < leftMax && rightIndex < rightMax)
    {
        if (a[leftIndex] < a[rightIndex])
        {
            output[outputIndex] = a[leftIndex];
            ++leftIndex;
        }
        else if (a[leftIndex] > a[rightIndex])
        {
            output[outputIndex] = a[rightIndex];
            ++rightIndex;
        }
        else
        {
            // items are equal.
            // increment the right, but don't copy anything
            ++rightIndex;
        }
    }
    

    You could also do this with a standard merge sort and then do a final pass over the sorted array to remove duplicates.

    You could use Quicksort with a custom comparison function that compares the telephone number and the date/time. Then do a final pass over the sorted array to remove duplicates.

    Both Quicksort and Mergesort are considered divide and conquer algorithms.