Search code examples
pythonlinked-listmergesort

I don't understand the need of these conditions. How and when can a linked_list be None?


(In split function) I ran it without these conditions, and it worked just as same as with them. When we get to a single node linked list, the stopping condition of merge_sort function should just return the same single node linked list back and rest can go on. I saw this in a tutorial and this was explained like " linked_list can be none when a linked_list having a single node is passed through split" Any help would be highly appreciated

if linked_list == None or linked_list.head == None:
    left_half = linked_list
    right_half = None

This is the whole program

from linked_list import LinkedList

def merge_sort(linked_list):
"""
Sorts a linked list in ascending order
- Recursively divide the linked list into sublists containing a single node
- Repeatedly merge the sublists to produce sorted sublists until one remains

Returns a sorted linked list

Takes O(n log n) time
Takes O(n) space
"""
if linked_list.size() == 1 or linked_list.head is None:
    return linked_list

left_half , right_half = split(linked_list)
left = merge_sort(left_half)
right = merge_sort(right_half)

return merge(left,right)

def split(linked_list):
"""
Divide the unsorted list at midpoint into sublists
Takes O(log n) time
"""
# linked_list can be none when a linked_list having a single node is passed through split
if linked_list == None or linked_list.head == None:
    left_half = linked_list
    right_half = None

    return left_half,right_half

else:
    size = linked_list.size()
    mid = size//2
    mid_node = linked_list.node_at_index(mid-1)

    left_half = linked_list
    right_half =LinkedList()
    right_half.head = mid_node.next_node
    mid_node.next_node = None

    return left_half,right_half

def merge(left,right):
"""
Create a new list that contains the node from left and right
"""
merged = LinkedList()

#fake head to avoid guessing if there is a head in our new linked_list or not
merged.add(0)

current = merged.head
left_head = left.head
right_head = right.head

#iterate over left and right until we reach the tail node of both
while left_head or right_head:
    #if the left_head is None, means we're past the tail node
    # Add the tail node from right to the merged linked list
    if left_head is None:
        current.next_node = right_head
        right_head = right_head.next_node

    elif right_head is None:
        # Add the tail node from left to the merged linked list
        #if the right_head is None, means we're past the tail node
        current.next_node = left_head
        left_head = left_head.next_node

    else:
        #not at either tail node
        #Obtain node data to perform comparison

        left_data = left_head.data
        right_data = right_head.data

        if left_data < right_data:
            current.next_node = left_head
            left_head= left_head.next_node
        else:
            current.next_node = right_head
            right_head = right_head.next_node

    current= current.next_node

#discarding the fake head
head = merged.head.next_node
merged.head = head

return merged

And this is the LinkedList program i am importing from

class Node:
"""
An object for creating a Node
"""

def __init__(self,data,next_node=None):
    self.data = data
    self.next_node = next_node

def __repr__(self):
    return "<Node: %s>"%self.data

class LinkedList:
"""
A (linear data structure)list to maintain the nodes created
The list refers to the first node as head and each node refers to the next node in the list
"""

def __init__(self):
    self.head = None #an empty LinkedList


def is_empty(self):
    return self.head is None


def size(self):
    """
    if current is none, while loop doesn't run, size return 0
    self.head = current, then we increase count and make current = self.next_node,
    when next_node becomes none, which means current becomes 0, it is past the tail, in that case while loop breaks
    """
    current = self.head
    count = 0

    while current:
        count += 1
        current = current.next_node

    return count


def add(self,data):
    """
    prepends the new date to the LinkedList
    """

    new_node = Node(data)
    new_node.next_node = self.head
    self.head = new_node

def search(self,key):
    """
    Searches the data by trasversing through the list. If not found, it returns None
    """
    current = self.head
    while current:
        if current.data == key:
            return current.data
        current = current.next_node
    return None

def insert(self,data,index):
    """
    Inserts the data at a particular index by trasversing through the LinkedList.
    Inserting is efficient as nothin needs to be shifted but finding and going to the index 
    makes it O(n)
    """
    if index == 0:
        self.add(data)
    if index > 0:
        new = Node(data)
        pos = index
        current = self.head

        while pos > 1:
            current = current.next_node
            pos -= 1

        prev_node = current
        next_node = current.next_node

        prev_node.next_node = new
        new.next_node = next_node

def remove(self,key):
    """
    Removes the first element matching the key in the LinkedList
    returns None if the list is empty or if the key is not in the list
    """
    current = self.head
    prev = None
    found = False

    while current and not found:
        if current.data == key and current is self.head: #if the key is at the head
            found = True
            self.head = current.next_node
        elif current.data == key:
            found = True
            prev.next_node = current.next_node
        else:
            prev = current
            current = current.next_node

    return current

def remove_at_index(self,index):
    current = self.head
    pos = index
    while pos > 0:
        current = current.next_node
        pos -= 1
    self.remove(current.data)

def node_at_index(self,index):
    """
    Returning the node at index by counting the number of times we used next_node on current.
    if it becomes equal to the index, we return the current
    """

    if index == 0:
        return self.head
    else:
        current = self.head
        pos = 0
        while pos < index:
            current = current.next_node
            pos += 1

    return current


def __repr__(self):
    """
    returns a string representation of the list
    """
    current = self.head
    ls = []

    while current:
        if current is self.head:
            ls.append('[Head: %s]'%current.data)
        elif current.next_node is None:
            ls.append('[Tail: %s]'%current.data)
        else:
            ls.append('[%s]'%current.data)

        current = current.next_node
    return '->'.join(ls)

Pardon the silly mistakes if any, I am just a beginner.


Solution

  • (In split function) I ran it without these conditions, and it worked just as same as with them

    If you would remove this code:

        if linked_list == None or linked_list.head == None:
            left_half = linked_list
            right_half = None
    
            return left_half,right_half
    
        else:
    

    ... only keeping the code that is in the else block, then it will work for all linked lists (also with a list with just one node as you correctly state), except when its size is 0:

    lst = LinkedList()
    left, right = split(lst)  # AttributeError: 'NoneType' object has no attribute 'next_node'
    

    This test code will then fail because mid_node = linked_list.node_at_index(mid-1) will assign None, as there is no node at index -1. The exception occurs when the code proceeds to evaluate mid_node.next.

    However, I agree that it makes little sense that split would be called with an argument that is None. Yet, the code makes this happen by doing this:

    if linked_list == None or linked_list.head == None:
        left_half = linked_list
        right_half = None   # <--- here
    

    This is bad! If we call split with a linked list that has no nodes, then we get a different value for left_half (empty list) than for right_half (None)! There is absolutely no good reason why it should do that. split should always return a pair of linked list instances. So the code should be corrected to this:

    if linked_list.head == None:
        left_half = linked_list
        right_half = LinkedList()  # <-- create an empty list
    

    Now their comment is no longer true: split cannot return a None list. And so any subsequent call of split with the returned values will be getting a linked list instance (not None). That whole concept of a None list should not be there. An empty list should still be an instance of LinkedList.

    Other remarks

    split mutates the list it gets as argument. In fact, after the call, the input list is identical to the left list that is returned. This is misleading to the uninformed caller. No mention is made of this in the comments. It would be better if the function were called splitOffRightHalf and the function would just return right_half. The caller would then know from the name that the original list will be truncated to its left half.

    Also, to avoid any discussion about a None argument (which really this function should never get), this function should better be a method of the LinkedList class, and only get self as argument. That way it is natural that it is called on an existing linked list instance and no check for None needs to be done.

    I didn't check the rest of the code, but I stumbled over this comment for the split function:

    # Takes O(log n) time
    

    This is not true. split depends on node_at_index which takes O(n) time. This erroneous comment greatly reduces the trust I have in this code.