Search code examples
algorithmsortingluatime-complexityin-place

"Lengthless" fast sorting algorithm


I've been looking into sorting algorithms. So far, all the sorting algorithms I've found either rely on a known length (pretty much all sort algos. I can't use them because "proper" length is O(n)), or are slower than quicksort (e.g. insertion sort).

In Lua, there are 2 concepts of length:

  • Proper sequence length
    • Is O(n)
    • Used by ipairs etc
  • Sequence length
    • Is O(log n)
    • Has holes (nil values)
    • Used by table.insert etc

I've looked into heapsort, but heapsort needs to build a heap, then sort. It doesn't do both as a single operation, which means it still suffers from the O(n) length problem.

With insertion sort, you just run the insertion sort algorithm until you hit the first nil. This sorts only the "proper sequence" part of a table (that is, the keys from 1 to n without any nil values), but insertion sort is slower than quicksort.

Are there any in-place sorting algorithms that, like insertion sort, don't depend on length, but with performance comparable to that of quicksort?

Example insertion sort code (with some help from wikipedia):

function isort(t)
  -- In-place insertion sort that never uses the length operator.
  -- Stops at the first nil, as expected. Breaks if you use "for i = 1, #t do"
  for i in ipairs(t) do
      local j = i
      while j > 1 and t[j-1] > t[j] do
          t[j], t[j-1] = t[j-1], t[j]
          j = j - 1
      end
  end
end

local t = {6, 5, 3, 1, 7, 2, 4, nil, 1, 1, 8, 3, 4, nil, nil, 1}
isort(t)
io.write("{")
if #t > 0 then
  io.write(tostring(t[1]))
  for i=2, #t do
    io.write(", ")
    io.write(tostring(t[i]))
  end
end
io.write("}\n")
-- stdout:
-- {1, 2, 3, 4, 5, 6, 7, nil, 1, 1, 8, 3, 4, nil, nil, 1}

Solution

  • Since the sort itself must take at least O(n log n), an extra O(n) scan doesn't seem like it would invalidate the algorithm. Using quadratic algorithms such as insertion or bubble sort is false economy.

    You could use the heapsort variant where you simply iteratively insert into a growing heap, rather than using the O(n) buildheap algorithm. Heapsort is definitely O(n log n), even if you build the heap incrementally, but I doubt whether it is competitive with quicksort. (It's definitely competitive with insertion sort for large inputs, particularly large inputs in reverse order.)

    You can see pseudocode for standard heapsort in Wikipedia. My pseudocode below differs in that it doesn't require the size of the array as a parameter, but instead returns it as the result. It also uses 1-based vectors rather than 0-based, since you are using Lua, so a is assume to run from a[1] to a[count] for some value of count.

     procedure heapsort(a):
         input: an array of comparable elements
         output: the number of elements in the array.
    
         (Heapify successive prefixes of the array)
         count ← 1
         while a has an element indexed by count:
             siftup(a, count)
             count ← count + 1
         count ← count - 1
    
         (Extract the sorted list from the heap)
         i ← count
         while i > 1:
             swap(a, 1, i)
             i ← i - 1
             siftdown(a, i)
    
         return count
    

    siftup and siftdown are the standard heap functions, here presented in the 1-based version. The code provided uses a standard optimization in which the sifting is done with a single rotation instead of a series of swaps; this cuts the number of array references significantly. (The swap in the heapsort procedure could be integrated into siftdown for a slight additional savings but it obscures the algorithm. If you wanted to use this optimization, change val ← a[1] to val ← a[count + 1]; a[count + 1] ← a[1] and remove the swap from heapsort.)

    In a 1-based heap, the parent of node i is node floor(i/2) and the children of node i are nodes 2i and 2i+1. Recall that the heap constraint requires that every node be no less than its parent. (That produces a minheap, which is used to produce a descending sort. If you want an ascending sort, you need a maxheap, which means changing the three value comparisons below from > to <.)

    procedure siftup(a, count):
        input: a vector of length count, of which the first count - 1
               elements satisfy the heap constraint.
        result: the first count elements of a satisfy the heap constraint.
    
        val ← a[count]
        loop:
            parent ← floor(count / 2)
            if parent == 0 or val > a[parent]:
                a[count] ← val
                return
            else
                a[count] ← a[parent]
                count ← parent
    
    procedure siftdown(a, count):
        input: a vector of length count which satisfies the heap constraint
               except for the first element.
        result: the first count elements of a satisfy the heap constraint.
    
        val ← a[1]
        parent ← 1
        loop :
            child ← 2 * parent
            if child < count and a[child] > a[child + 1]:
                child ← child + 1
            if count < child or not (val > a[child]):
                a[parent] ← val
                return
            else
                a[parent] ← a[child]
                parent ← child