Search code examples
javaalgorithmsortinglinked-listdoubly-linked-list

Understanding insertion sort for doubly linked list


So I understand how insertion sort works but trying to apply the concept to doubly linked lists fizzles out my brain. Can someone please explain how the algorithm works in this context? I cannot wrap my head around how the node pointers change after each node-by-node comparison, given a pre-existing linked list. I am currently working in Java and referring to this code-example: https://www.geeksforgeeks.org/insertion-sort-doubly-linked-list/

Below are two functions, sortedInsert and insertionSort, where the former is helper function.

What is the else clause doing in sortedInsert? Also why does the author "removing all links so as to create current" in the insertionSort function?

// function to insert a new node in sorted way in
// a sorted doubly linked list
static Node sortedInsert(Node head_ref, Node newNode)
{
    Node current;
 
    // if list is empty
    if (head_ref == null)
        head_ref = newNode;
 
    // if the node is to be inserted at the beginning
    // of the doubly linked list
    else if ((head_ref).data >= newNode.data)
    {
        newNode.next = head_ref;
        newNode.next.prev = newNode;
        head_ref = newNode;
    }
 
    else
    {
        current = head_ref;
 
        // locate the node after which the new node
        // is to be inserted
        while (current.next != null &&
            current.next.data < newNode.data)
            current = current.next;
 
        //Make the appropriate links /
 
        newNode.next = current.next;
 
        // if the new node is not inserted
        // at the end of the list
        if (current.next != null)
            newNode.next.prev = newNode;
 
        current.next = newNode;
        newNode.prev = current;
    }
    return head_ref;
}
 
// function to sort a doubly linked list using insertion sort
static Node insertionSort(Node head_ref)
{
    // Initialize 'sorted' - a sorted doubly linked list
    Node sorted = null;
 
    // Traverse the given doubly linked list and
    // insert every node to 'sorted'
    Node current = head_ref;
    while (current != null)
    {
 
        // Store next for next iteration
        Node next = current.next;
 
        // removing all the links so as to create 'current'
        // as a new node for insertion
        current.prev = current.next = null;
 
        // insert current in 'sorted' doubly linked list
        sorted=sortedInsert(sorted, current);
 
        // Update current
        current = next;
    }
 
    // Update head_ref to point to sorted doubly linked list
    head_ref = sorted;
     
    return head_ref;
}

Solution

  • What is the else clause doing in sortedInsert?

    The first two blocks (before the else) deal with two boundary cases:

    • What to do when the list is empty
    • What to do if the new node must be inserted before the first node of the list.

    So the else block is dealing with all other cases, i.e. where the new node will not be the new head of the list, but the current head node will remain what it was.

    It first finds a node (current) for which the next one holds a value that should come after the new node's value (or alternatively, it has no next node following it). By consequence (and because of the previous boundary case), we then know that the current node's value should come before the new node's value. So if we find such a current, we know that the new node must be inserted between current and current.next.

    In short, the while loop finds the spot where to insert newNode. This is a typical phase in any insertion-sort algorithm.

    The section that is commented "make the appropriate links" will then make up to 4 rewiring assignments.

    Here is an example. Let's say the linked list has 3 values 10, 20 and 30 at the moment sortedInsert is called with a new node that has value 25:

     head_ref
      ↓
    ┌───────────┐    ┌───────────┐    ┌───────────┐
    │data: 10   │    │data: 20   │    │data: 30   │
    │next: ────────> │next: ────────> │next: null │
    │prev: null │    │prev: ─┐   │    │prev: ─┐   │ 
    └───────────┘    └───────│───┘    └───────│───┘
               ^             │  ^             │
               └─────────────┘  └─────────────┘
    
    ┌───────────┐ 
    │data: 25   │
    │next: null │
    │prev: null │
    └───────────┘
      ↑
     newNode
    

    Because head_ref is not null and head_ref.data < newNode.data, we get in the else block where current is defined. The while loop will iterate once, and make the current reference stop at this point:

     head_ref         current
      ↓                ↓
    ┌───────────┐    ┌───────────┐    ┌───────────┐
    │data: 10   │    │data: 20   │    │data: 30   │
    │next: ────────> │next: ────────> │next: null │
    │prev: null │    │prev: ─┐   │    │prev: ─┐   │ 
    └───────────┘    └───────│───┘    └───────│───┘
               ^             │  ^             │
               └─────────────┘  └─────────────┘
    

    Now current.next.data >= newNode.data and so we have found the insertion point for newNode.

    The first rewiring is done with newNode.next = current.next, which results in this:

     head_ref         current
      ↓                ↓
    ┌───────────┐    ┌───────────┐    ┌───────────┐
    │data: 10   │    │data: 20   │    │data: 30   │
    │next: ────────> │next: ────────> │next: null │
    │prev: null │    │prev: ─┐   │    │prev: ─┐   │ 
    └───────────┘    └───────│───┘    └───────│───┘
               ^             │  ^             │ ^
               └─────────────┘  └─────────────┘ │
                                                │
                             ┌───────────┐      │
                             │data: 25   │      │
                             │next: ────────────┘
                             │prev: null │
                             └───────────┘
                               ↑
                              newNode
    

    The next rewiring only happens when current is not the tail node: newNode.next.prev = newNode, which is our case:

     head_ref         current
      ↓                ↓
    ┌───────────┐    ┌───────────┐    ┌───────────┐
    │data: 10   │    │data: 20   │    │data: 30   │
    │next: ────────> │next: ────────> │next: null │
    │prev: null │    │prev: ─┐   │    │prev: ─┐   │ 
    └───────────┘    └───────│───┘    └───────│───┘
               ^             │                │ ^
               └─────────────┘                │ │
                                              │ │
                             ┌───────────┐    │ │
                             │data: 25   │ <──┘ │
                             │next: ────────────┘
                             │prev: null │
                             └───────────┘
                               ↑
                              newNode
    

    At this stage, the new node is correctly linked with the node that should follow it (the one with value 30). Now remain two more assignments to establish the link between the node that should precede the new node (the one with value 20). First assignment is current.next = newNode, which results in:

     head_ref         current
      ↓                ↓
    ┌───────────┐    ┌───────────┐       ┌───────────┐
    │data: 10   │    │data: 20   │       │data: 30   │
    │next: ────────> │next: ────────┐    │next: null │
    │prev: null │    │prev: ─┐   │  │    │prev: ─┐   │ 
    └───────────┘    └───────│───┘  │    └───────│───┘
               ^             │      │            │ ^
               └─────────────┘      │            │ │
                                    v            │ │
                                 ┌───────────┐   │ │
                                 │data: 25   │ <─┘ │
                                 │next: ───────────┘
                                 │prev: null │
                                 └───────────┘
                                  ↑
                                 newNode
    

    And finally the rewiring is completed with newNode.prev = current:

     head_ref         current
      ↓                ↓
    ┌───────────┐    ┌───────────┐       ┌───────────┐
    │data: 10   │    │data: 20   │       │data: 30   │
    │next: ────────> │next: ────────┐    │next: null │
    │prev: null │    │prev: ─┐   │  │    │prev: ─┐   │ 
    └───────────┘    └───────│───┘  │    └───────│───┘
               ^             │ ^    │            │ ^
               └─────────────┘ │    │            │ │
                               │    v            │ │
                               │ ┌───────────┐   │ │
                               │ │data: 25   │ <─┘ │
                               │ │next: ───────────┘
                               │ │prev: ─┐   │
                               │ └───────│───┘
                               └─────────┘
                                  ↑
                                 newNode
    

    This is no different then:

     head_ref         current          newNode
      ↓                ↓                ↓
    ┌───────────┐    ┌───────────┐    ┌───────────┐    ┌───────────┐
    │data: 10   │    │data: 20   │    │data: 25   │    │data: 30   │
    │next: ────────> │next: ────────> │next: ────────> │next: null │
    │prev: null │    │prev: ─┐   │    │prev: ─┐   │    │prev: ─┐   │ 
    └───────────┘    └───────│───┘    └───────│───┘    └───────│───┘
               ^             │  ^             │  ^             │  
               └─────────────┘  └─────────────┘  └─────────────┘
    

    The caller gets back the head reference head_ref, which actually didn't change. It would only change if the new node became the first node in the list, which is what the first two if blocks deal with.

    Also why does the author "removing all links so as to create current" in the insertionSort function?

    This is just a clean way to extract the node from the input list: it is no longer a part of it, and is ready to become part of the new list (sorted). Technically this is not necessary for the case described above, since there both the prev and next members of newNode get new references assigned anyway, but it remains important for the first two boundary cases, as there we do not assign null values where they are actually needed.

    Alternatively, you could assign those null references in sortedInsert.

    Hope this clarifies it.