Search code examples
pythonalgorithmmergesort

Merge Sort does not work on large data set


I'm trying to use a merge sort algorithm to count the inversions in an array that contains all the numbers in the range 1 to 100,000. It works fine when I input a small data set but doesn't return the correct answer when I input the file containing the larger array. Is there a problem with the way I'm inputting the file, maybe? I've never had to input a file to an array before, so I can't say definitively whether I'm doing it correctly.

file = open('IntegerArray.txt','r')
integerArray = []
for line in file:
    integerArray.append(line)


inversions = 0

def divide(list):
    if len(list) > 1:
        middle = int(len(list) / 2)
        left = divide(list[:middle])
        right = divide(list[middle:])

        #left = divide(left)
        #right = divide(right)

    elif len(list) <= 1:
        return list

    return merge(left,right)

def merge(left,right):
    global inversions
    result = []

    while left != [] and right != []:
        if left[0] < right[0]:
            result.append(left[0])
            if len(left) > 1:
                left = left[1:]
            else:
                left = []
        elif left[0] > right[0]:
            result.append(right[0])
            inversions += len(left)
            if len(right) > 1:
                right = right[1:]
            else:
                right = []

    if left != []:
        result += left
    if right != []:
        result += right

    return result

divide(integerArray)
print(inversions)

This should return 2407905288 but returns 2397819672 instead.


Solution

  • Seems it should not work for most of cases with numbers bigger than 9! You're keeping numbers in a list of strings. So your comparator in merge functions compares two strings, so for example 2 is bigger than 12!

    At least you need to change your first lines to:

    file = open('IntegerArray.txt','r')
    integerArray = []
    for line in file.readlines():
        integerArray.append(int(line.strip()))