Search code examples
pythonalgorithmsortingbucket-sort

What is this bucket sort implementation doing?


This is my code for bucket sort in Python.

from random import randrange


def insertion_sort(aList):
    for i in range(1, len(aList)):
        for j in range(i, 0, -1):
            if aList[j] < aList[j-1]:
                aList[j], aList[j-1] = aList[j-1], aList[j]
    return aList

def bucket_sort(aList):
    buckets =  [[]] * len(aList)
    for index, value in enumerate(aList):
        buckets_index = value * len(aList) // (max(aList) + 1)
        buckets[buckets_index].append(value)

answer = []

for bucket in buckets:
    answer.extend(insertion_sort(bucket))
    # answer += insertion_sort(bucket)

print(buckets[0])
print("\n")
# return answer


aList = [randrange(10) for _ in range(100)]
print(aList)
print("\n")
answer = bucket_sort(aList)
#print(answer)

What is happening? When I run the code, I always find that the first list in buckets is already sorted and the other lists in buckets are all copies of it. Do I need the insertion sort for each list? What would I use the "answer" variable for?!

I'm mainly relying on this visualization.


Solution

  • One thing that i notice right off the bat is that you initialize your variable buckets as buckets = [[]] * len(aList). This makes a list of identical copies of the empty list. As such, any modification of this list is replicated in every element of buckets. Change this line to:

    buckets =  [[] for _ in xrange(len(aList))]
    

    To check if the lists inside the list are separate object, you could check their id's:

    print [id(x) for x in buckets]
    

    This should print a list of unique numbers.