Search code examples
ralgorithmheapsort

Problems with an implementation of Heapsort algorithm in R


I would like to create my own Heapsort algorithm in R.
That is my code

heapify <- function(array, n, i)
{
  parent <- i
  leftChild <- 2 * i + 1
  rightChild <- 2 * i + 2

  if ((leftChild < n) & (array[parent] < array[leftChild]))
  {
    parent = leftChild
  }

  if ((rightChild < n) & (array[parent] < array[rightChild]))
  {
    parent = rightChild
  }

  if (parent != i)
  {
    array = replace(array, c(i, parent), c(array[parent], array[i]))
    heapify(array, n, parent)
  }
}

heapSort <- function(array)
{
  n <- length(array)

  for (i in (n+1):1)
  {
    heapify(array, n, i)
  }

  for (i in n:1)
  {
    array = replace(array, c(i, 0), c(array[0], array[i]))
    heapify(array, i, 1)
  }
  print(array)
}

However that implementation seems to be incorrect. That's an example of an input and output.

array <- c(5, 14, 3, 70, 64)
heapSort(array)

Output: [1]  5 14  3 70 64

I have spent quite a while and I have no idea where the problem is. I would appreciate any hints or tips.


Solution

  • My guess is that you were trying to convert the algorithm posted on GeeksforGeeks where they implement this in many zero based languages. This is one of the sources of your problem (R starts indexing at 1 instead of 0).

    Base Zero Indexing Issues:

    Example 1:

                             ## We also need to swap these indices
    array = replace(array, c(i, 0), c(array[0], array[i]))
    heapify(array, i, 1)
    
    Should be:
    
    array <- replace(array, c(i, 1), array[c(1, i)])
    array <- heapify(array, i, 1)
    

    Example 2:

    leftChild <- 2 * i + 1
    rightChild <- 2 * i + 2
    
    Should be:
    
    leftChild <- 2 * (i - 1) + 1
    rightChild <- 2 * (i - 1) + 2
    

    Pass By Reference Assumption

    In R, you cannot pass an object by reference (see this question and answers Can you pass-by-reference in R?). This means that when we call a recursive function we must assign it and the recursive function must return something.

    In heapify we must return array. Also every call to heapify we must assign array to the output.

    Here is the amended code:

    heapify <- function(array, n, i)
    {
        parent <- i
        leftChild <- 2 * (i - 1) + 1
        rightChild <- 2 * (i - 1) + 2
    
        if ((leftChild < n) & (array[parent] < array[leftChild]))
        {
            parent <- leftChild
        }
    
        if ((rightChild < n) & (array[parent] < array[rightChild]))
        {
            parent <- rightChild
        }
    
        if (parent != i) {
            array <- replace(array, c(i, parent), array[c(parent, i)])
            array <- heapify(array, n, parent)
        }
    
        array
    }
    
    heapSort <- function(array)
    {
        n <- length(array)
    
        for (i in floor(n / 2):1) {
            array <- heapify(array, n, i)
        }
    
        for (i in n:1) {
            array <- replace(array, c(i, 1), array[c(1, i)])
            array <- heapify(array, i, 1)
        }
    
        array
    }
    

    Here are some tests (note this algorithm is extremely inefficient in R.. do not try on vectors much larger than below):

    array <- c(5, 14, 3, 70, 64)
    heapSort(array)
    [1]  3  5 14 64 70
    
    set.seed(11)
    largerExample <- sample(1e3)
    
    head(largerExample)
    [1] 278   1 510  15  65 951
    
    identical(heapSort(largerExample), 1:1e3)
    [1] TRUE