Search code examples
c++sortingdata-structuresbinary-search

Straight and Binary insertion sort


Hello I was asked to improve insertion sort by using binary search instead of linear. The problem is that the best case is now O(n log n) instead of O(n) due to data comparisons, which made the program if not slower in some cases equal. Here is my binary insertion sort code:

void InsertionSort(int data[], int size)
{
    int index = -1;
    for(int i = 1,temp,j;i<size;i++)
    {
        temp = data[i];//DM O(N)
        int high = i,low = 0,mid;//DM O(N)
        while(low <= high)//DC O(nlogn)
        {
            mid = (low + high) /2;
            if(temp < data[mid])
            {
                high = mid - 1;
                index = mid;
            }

            else if (temp > data[mid])
            {
                low = mid + 1;      
            }

            else if(data[mid] == temp)
            {
                index = mid;
                break; 
            } 

        }
        for(j = i;j > index;j--)
        {
            data[j] = data[j-1];//DC Worst Case O(n*n) but the exact is summation of n(n+1) / 2 nad best case o(1)
        }
        data[j] = temp;//DM O(n)
    }   
}

Solution

  • You can start your binary search with a biased stage that favours the best case. Instead of going directly to (low+high)/2, start at position i-1, then i-2, then i-4, i-8, i-16, i-32... until you find a smaller element, or until i-whatever gets lower than low. Then continue with the ordinary binary search.

    Note that this optimisation comes at a cost. The best case ---sorted or almost sorted data--- takes O(N) time, but the average case and the worst case get a bit slower with respect to the simple binary search version.

    void InsertionSort (int data[], int size)
    {
        int i, j, high, low, mid, hop;
        int temp;
    
        for (i=1; i<size; i++)
        {
            temp = data[i];
    
            high = i;
            low = 0;
            hop = 1;
    
            do
            {
                mid = high - hop;
                hop <<= 1;
    
                if (temp < data[mid])
                    high = mid;
                else
                    low = mid + 1;
            }
            while (low+hop <= high);
    
            while (low != high)
            {
                mid = (low + high) / 2;
    
                if (temp < data[mid])
                    high = mid;
                else
                    low = mid + 1;
            }
    
            for(j=i; j>low; j--)
                data[j] = data[j-1];
    
            data[j] = temp;
        }   
    }
    

    Note also that high is assigned mid and not mid+1. The case where temp==data[mid] is treated exactly as the case where temp>data[mid]. This is to keep a good property of insertion sort: it is a stable sort. It makes no difference when sorting plain integers, though.