Search code examples
rustborrow-checkerownership

What is this rust code to remove duplicates from an unsorted linked list doing on a low-level in the else branch?


I have a linked list:

pub struct Node<T> {
    val: T,
    next: Option<Box<Node<T>>>
}

pub struct LinkedList<T> {
    head: Option<Box<Node<T>>>
}

And my question concerns this function to remove duplicates:

impl<T: Clone + Eq + Hash> LinkedList<T> {
    pub fn remove_dupes(&mut self) {
        let mut set = HashSet::new();
        let mut current = &mut self.head;
        while let Some(mut node) = current.take() { // AO
            if set.contains(&node.val) {
                *current = node.next.take();
            } else {
                set.insert(node.val.clone());
                let next = node.next.take(); // A1 take rest of list. Node.next is now None
                *current = Some(node); // A2 restore ownership of node to current but its now pointing to None.
                 current = &mut current.as_mut().unwrap().next; // *A3* None because (A1) took ownership. *Why is this line necessary?*
                *current = next; // A4 move to next node to continue with rest of list
            }
        }
    }

It works and removes the dupes just like the following does in a way that is simpler to understand:

impl<T: Clone + Eq + Hash> LinkedList<T> {
    pub fn remove_dupes(&mut self) {
        let mut set = HashSet::new();
        let mut current = &mut self.head;
        while let Some(mut node) = current.take() //B0 {
            if set.contains(&node.val) {
                *current = node.next.take();
            } else {
                set.insert(node.val.clone());
                *current = Some(node); // B1
                 current = &mut current.as_mut().unwrap().next; // B2
            }
        }
    }

I'm new to rust and working on understanding what is happening in memory. I don't understand what is the need for line A3 -- current = &mut current.as_mut().unwrap().next; -- and what it is really doing but the function doesn't work without it. Can someone

  1. give a breakdown of what is happening in the first function and why this works to restore ownership to current and move to the next node?
  2. explain why self.head is still pointing to the original head of the list after the function returns, yet is pointing to the end of the list (None), if current = &mut current.as_mut().unwrap().next line is commented out in line A3.

I have a playground with the code here (a bit noisier printing debug statements): https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=0015a922b6cc0f7284c9b37d0dfb95e6

My understanding is as follows and may be flawed:

  1. while let Some(mut node) = current.take() (A0) This line takes ownership of current on each of the iteration of the loop, giving it to the variable "node". current is None as a result.
  2. let next = node.next.take(); (A1) This line takes ownership of node.next and stores it in variable next for later use. node.next is None as a result.
  3. *current = Some(node); (A2) Point current to the node again. current is no longer None. It is now pointing to Node { val: (whatever current value is), next: None}
  4. current = &mut current.as_mut().unwrap().next; (A3) Without this line, the loop will still advance but the linked list will eventually point to the end of the list (None) when the function returns to the caller. The value of current.as_mut().unwrap().next; should always be None here because we took ownership away from node.next in A1.
  5. *current = next; (A4) Point current to the next node that we had stored in next in A1.

Note that commenting out the 2nd to last line (A3) is equivalent to

    pub fn remove_dupes(&mut self) {
        let mut set = HashSet::new();
        let mut current = &mut self.head;
        let mut restore:&mut Option<Box<Node<T>>> = &mut None;
        while let Some(mut node) = current.take() {
            if set.contains(&node.val) {
                *current = node.next.take();
            } else {
                set.insert(node.val.clone());
                let next = node.next.take();
                *restore = Some(node);
                 current = &mut restore.as_mut().unwrap().next;
                *current = next;
            }
        }
    }
    pub fn remove_dupes(&mut self) {
        let mut set = HashSet::new();
        let mut current = &mut self.head;
        while let Some(mut node) = current.take() {
            if set.contains(&node.val) {
                *current = node.next.take();
            } else {
                set.insert(node.val.clone());
                let next = node.next.take();
                *current = Some(node);
                *current = next;
            }
        }
    }

printing the linked list after the call (with my formatter) results in the head == None:

LL(0): |None|

Solution

  • The tricky part here is that after the line current = &mut current.as_mut().unwrap().next, *current will indeed be None, but current does not have a value of None, it is a pointer to the next of the node.

    Given it is a pointer, we can not only observe next with it, we can also change it. Basically, we advance current as the pointer to the current list in the node.

    If we omit this line, current will always stay pointing at head, and all changes applied to it will apply to head. So we will advance through the list but just change head and remove all nodes, causing head to point to the last node, None.


    Note that the following simpler code:

    impl<T: Eq + Hash> LinkedList<T> {
        pub fn remove_dupes(&mut self) {
            let mut set = HashSet::new();
            let mut current = &mut self.head;
            while let Some(node) = current {
                if set.contains(&node.val) {
                    *current = node.next.take();
                } else {
                    set.insert(&node.val);
                    current = &mut node.next;
                }
            }
        }
    }
    

    Should work, but currently does not because of a known shortcoming in the current borrow checker.