Search code examples
pythonsortingmergesortinsertion-sorttimsort

Python merge sort + insertion sort hybrid Tim sort


I have already made code for insertion sort and merge sort. Now I want to implement my insertion sort and merge sort into a Tim sort.

I can see that the example of Tim sort uses start, mid and end inputs but it should be possible to do this without no?

I would like to keep my merge and insertion sorts as they are if possible because of input outputs fits well with the rest of my code.

from random import randint
minrun = 32

def insertion_sort(in_data):
    s_data = list(in_data)
    for i in range(1, len(s_data)):
        key = s_data[i]
        j = i - 1
        while j >= 0 and key < s_data[j]:
            s_data[j + 1] = s_data[j]
            j -= 1
        s_data[j + 1] = key
    return s_data

def merge(a, b):
    c = []
    a_idx, b_idx = 0, 0
    while a_idx < len(a) and b_idx < len(b):
        if a[a_idx] < b[b_idx]:
            c.append(a[a_idx])
            a_idx += 1
        else:
            c.append(b[b_idx])
            b_idx += 1
    if a_idx == len(a):
        c.extend(b[b_idx:])
    else:
        c.extend(a[a_idx:])
    return c

def merge_sort(a):
    if len(a) <= 1:
        return a
    left, right = merge_sort(a[:len(a) // 2]), merge_sort(a[len(a) // 2:])
    return merge(left, right)

def tim_sort(in_data):
    n = len(in_data)

    for start in range(0, n, minrun):
        end = min(start + minrun - 1, n - 1)
        in_data = insertion_sort(in_data, start, end)

    curr_size = minrun
    while curr_size < n:
        for start in range(0, n, curr_size * 2):
            mid = min(n - 1, start + curr_size - 1)
            end = min(n - 1, mid + curr_size)
            in_data = merge_sort(in_data, start, mid, end)
        curr_size *= 2
    return in_data

def create_array(size=10, max=50):
    from random import randint
    return [randint(0, max) for _ in range(size)]

I found this example of Tim sort but I struggle with how to make it work within my code.


Solution

  • 27 November 2020

    It is not clear where you obtained this example, but it certainly is not timsort. If you want to implement timsort, the first thing to do is read and understand Tim Peters description of the algorithm:

    https://svn.python.org/projects/python/trunk/Objects/listsort.txt

    This is the definitive document regarding timsort. You can find all kinds of rubbish with google. The only reference I have ever found that might be worth reading to get your feet wet is:

    https://www.infopulse.com/blog/timsort-sorting-algorithm

    which is lightweight, but fairly complete and not seriously incorrect in any way. It does, however, omit any consideration of galloping, which is the trickiest part of the algorithm.

    It is important to realize python is a dynamic language so dumbly implementing timsort will produce something that uses large excesses of memory due to internal object allocation. Timsort requires:

    • sorting in-place
    • stability
    • maintaining an invariant
    • galloping
    • thorough testing

    Sorting in-place in python implies indexing the data list manually. If you use slices, you allocate and dispose of memory each time.

    There are three python implementations on the web that are worth looking at for guidance that I am aware of:

    1 https://github.com/reingart/pypy/blob/master/rpython/rlib/listsort.py

    2 https://gist.github.com/ruminations/89a045dc0ef7edfb92304a0de0752ee0

    3 https://github.com/hu-ng/timsort

    The first is part of the pypy trunk and is implemented in rpython. It appears to be an adaptation of the cpython implementation. rpython is a restricted subset of python intended for static compilation.

    The second is a well tested implementation that is well documented and fairly readable. The last is apparently a university exercise, and appears to be complete and correct, but not well tested.

    You can find dozens of other attempts at python implementation of timsort but all I have seen either fail to fulfill basic requirements or are clearly incorrect.

    Finally, if you expect someone to help you adapt your code, you should at least link to it, but better to provide it directly, for neither mergesort nor insort are difficult to code.