Search code examples
vbapseudocodemergesort

Merge Sort algorithm isn't merging properly


So, I do know what a merge sort is supposed to do, and I can somewhat visualize it now. Recursively splitting until only one element is left in the array, since an array of one element is already sorted, it reduces the amount of work needed for each recursion, and appending an already sorted array to another array is a lot less computationally intensive than to sort by normal iteration.

I have two major problems right now. One, on the splitting (which may be a simple one but it will throw off everything if not fixed), when I split them by going low -> mid (+/- 1) and mid -> high, I get a problem where the base case is not properly tested and returns prematurely, causing an unsorted array. To give an example, as I quote my reply from another forum, "if I have a middle of 5, hi of 9 and low of 0, then I have to either split it right from 0 to 5 and left from 6 to 9, or right 0 to 4 and left 5 to 9. The problem I'm having is that if you split it again, say from 6 to 9, I have a middle of 7 due to rounding, meaning the right will have only 6 to 7, making it fail the case of hi - low > 2 since 7 - 6 equals 1 and is less than 2, leaving 2 potentially unsorted elements."

Now, this happens either way, as adding +/- to mid for either will result in a sort of odd low number which doesn't work well. How do I get around this?

Second, when it comes to merging, how do I properly (and efficiently) check whether or not B'sand C's array bounds are appropriate. Would I need another conditional statement to check whether or not PosB and PosC are in bounds, and if one isn't, how can I appropriately (and neatly) add on whats left of the other array to the final array.

Thanks in advance. This is supposed to be in visual basic, but for now pseudocode seems to be the best way to approach these problems rather than stressing correct syntax.

A[] = { 28, 39, 48, 27, 9, 88, 11, 4, 7} ' Global variable, disregard bad programming practices

Function Split(int low, int hi){ ' Adding brackets because not only am I used to java, it should help readability
    if (hi - low) < 2 then
        return array of elements from [low, hi]
    end if

    mid = (hi + low) / 2
    B[] = split(low, mid-1) ' Either I do mid-1 or mid+1, the results seem the same
    C[] = split(mid, hi) ' Same problems as above
    D[] = Merge(B[], C[])
    return D[]
End Function

Function Merge(B[], C[]) ' I use two arrays because I figured it'd be the easiest to work with.
    int PosB = 0, PosC = 0, PosMax = (b.length + c.length) -1 ' PosB and PosC keep track of the positions in the cells of B and C respectively. 
                                                                'PosMax keeps track of the max cell that E[] (declared below) should have as well as the max number for the for loop
    E[num] ' Declare and iniatialize an array that is supposed to hold the combined values of B and C, sorted.
    for i = 0 to num
        if PosC >= c.length or B[PosB] < C[PosC] ' Checks if C[] already has added everything it has to E[] and if so, proceeds to add the
                                                    ' supposedly already sorted array B[]'s elements to E[]. Emphasis on "supposedly". A problem here is it does not check for if PosB
                                                    ' is out of bounds, which is a huge problem with primitive arrays. Also checks if the current element in C is greater than the
                                                    ' current element in B, and if so, add the element in B and increment.
            E[i] = B[PosB]
            PosB += 1 ' Does not check whether or not PosB is at the end, gotta figure a way to check
        Else
            E[i] = C[PosC]
            PosC += 1
        End If
    Next
End Function 

Solution

  • The problem is, that the borders of the array are inclusive which means:

    When you access the positions from 6-8 there are 3 elements in your result (6,7,8). The logical result is: If you want just one element described by a range in an array you would have to write [6-6]; [6-7] means there are still two elements (6 and 7). Lets look at the following code with an example:

    if (hi - low) < 2 then
        return array of elements from [low, hi]
    

    What happens when you give the range from 6-7 to this function? 7-6=1 -> true -> return. But in the range from 6-7 are still two elements. So for fixing this issue it would be enough to write (hi - low) < 1 or easier to read (hi - low) == 0.

    The next point is more about VBA so it's just a thought as I'm not very familiar with VBA:

    mid = (hi + low) / 2
    

    If this returns an integer the result is probably rounded to the lower value ( (3+4)/2=3 ). If so I would write the following like:

    B[] = split(low, mid)
    C[] = split(mid + 1, hi)
    

    The reason for that is that mid is already the lower border (due to rounding). Substracting 1 could cause some issues when the border is close to 0 as it could result in negative values.

    For the second part:

    It would be easier to split the process into two:

    E[num]' I don't know what num means but I suppose it's correct here
    int PosE = 0
    'adding elements to the new array while they can be compared to each other
    while(c.length > 0 and b.length > 0)
        if(C[PosC] < B[PosB])
           E[PosE] = C[PosC]
           PosC += 1
        Else
           E[PosE] = B[PosB]
           PosB += 1
        End If
        PosE += 1
    Next
    
    'one of the array B or C (or both) is empty now. The remaining elements have to be added. The order doesn't matter any more.
    
    for i = PosC to c.length 'I don't know if this is possible in VBA but I think you know what I mean: adding all the remaining elements of C to E (if there are any)
        E[PosE] = C[PosC]
        PosC += 1
        PosE += 1
    Next
    
    'doing the same with B; it could happen that one of those loops never run
    for i = PosB to b.length
        E[PosE] = B[PosB]
        PosB += 1
        PosE += 1
    Next
    

    I hope this works because I've never written anything in VB.

    If there are any questions remaining feel free to ask.