Search code examples
c++classlinked-listnodes

How to remove duplicate entries in a linked list input?


I am tasked with removing all duplicate entries in the linked list. Assuming the list is ordered from smallest to largest, therefore all I need to do is remove the duplicates next to each other.

For example: If the list contains {1,2,2,2,3,3,4,5,5,5,5,5,6} before calling remove_duplicates(), it should contain{1,2,3,4,5,6} after.

My problem is that my code will print the duplicates as well whenever remove_duplicates() is called. Here is my code:

list.cpp:

#include <iostream>
using namespace std;
#include "list.h"

// on some machines member variables are not automatically initialized to 0
List::List()
{
    m_head = NULL;
    m_length = 0;
}

// delete all Nodes in the list
// since they are dynamically allocated using new, they won't go away
// automatically when the list is deleted
// Rule of thumb: destructor deletes all memory created by member functions
List::~List()
{
    while (m_head)
    {
        Node *tmp = m_head;
        m_head = m_head->m_next;
        delete tmp;
    }
}

// always insert at the front of the list
// Note: this works even in the SPECIAL CASE that the list is empty
void List::insert(int value)
{
    if (!m_head || value < m_head->m_value)
    {
      m_head = new Node(value, m_head);
    }
    else
    {
      Node *ptr = m_head;
      while (ptr->m_next != NULL && value > ptr->m_next->m_value)
      {
        ptr = ptr->m_next;
      }
      ptr->m_next = new Node(value, ptr->m_next);
    }
    m_length++;
}

// iterate through all the Nodes in the list and print each Node
void List::print()
{
    for (Node *ptr = m_head; ptr; ptr = ptr->m_next)
    {
        cout << ptr->m_value << endl;
    }
}

void List::remove_duplicates()
{
    Node *curr = m_head;
    Node *temp = m_head;
    Node *delPtr = NULL;

    if(curr == NULL)
    {
        return;
    }

    if(curr != NULL)
    {

        if(curr -> m_value == curr ->m_next ->m_value)
       {
           delPtr = curr;
           curr = curr -> m_next;
           temp ->m_next = curr;
           delete delPtr;
       }

       else
       {
           curr = curr -> m_next;
       }

    }

    }

list.h:

class List
{
    public:
        List();
        ~List();

        void insert(int value); // insert at beginning of list
        void print(); // print all values in the list
        int length() {return m_length;}
        void remove_duplicates();
        int *convert_to_array(int &size);

    private:
        class Node
        {
            public:
                Node(int value, Node *next)
                {m_value = value; m_next = next;}
                int m_value;
                Node *m_next;
        };
        Node *m_head;
        int m_length;
};

main.cpp:

#include <assert.h>
#include <iostream>
using namespace std;
#include "list.h"

int main()
{
    List list;
    int value;

    // read values and insert them into list
    while (cin >> value)
    {
      list.insert(value);
    }


    cout << "printing original list: \n";
    list.print();

    list.remove_duplicates();
    cout << "printing list after removing duplicates: \n";
    list.print();
}

Note: Everything has been given to me by my instructor, the only code that I am supposed to write is void List::remove_duplicates()

I have looked up other examples on stackoverflow, but none seem to help me with my situation. Also, I do want to mention that I am still very new to Linked Lists and how they function. Therefore I am not exactly sure where to go from here. Any help would be appreciated


Solution

  • You need to iterate over your list using both the Address of the current pointer, and the current pointer. See Linus on Understanding Pointers. Since you don't just want to delete a node with a certain value, but you want to scan forward in the list deleting ALL nodes with that value, you want to loop over each node, and within the loop scan-forward deleting ALL nodes with the value of the current node.

    You do that by checking the m_next->m_value and if it the same as the current node value, you REPLACE the node at the current pointer ADDRESS with the next node content. Since you still have a pointer to the node you have replaced, you can simply delete that node and then update the pointer from the address of the current node.

    This presumes your list is in SORT-ORDER as you show in your question. If it isn't, you would need to scan the list multiple times. In sort-order as shown, you could do:

    /* list must be in sort order */
    void List::remove_duplicates()
    {
        Node **pp = &m_head,    /* current address of head (pointer-to-pointer) */
              *p = m_head;      /* pointer to head */
        
        /* loop over each node */
        for (; p; pp = &p->m_next, p = p->m_next) {
            /* while m_next and current value == next value */
            while (p->m_next && (*pp)->m_value == p->m_next->m_value)
            {
                *pp = p->m_next;    /* set node at current address to next node */
                delete p;           /* delete original node at that address */
                p = *pp;            /* update pointer to current node address */
            }
        } 
    }
    

    (essentially if the current node and next have the same m_value you are throwing away the current node (and properly deleting the memory) and replacing it with the next)

    Then for example, if your list originally contained:

     0 0 1 1 1 2 3 4 4 5 6 6 6 7 8 9 9 9
    

    calling list.remove_duplicates() would result in:

     0 1 2 3 4 5 6 7 8 9
    

    Look things over and let me know if you have further questions.