Search code examples
c#datatablehashtableduplicates

How do I remove duplicates from a datatable altogether based on a column's value?


I have 3 columns in a DataTable

Id Name Count

1 James 4345

2 Kristen 89231

3 James 599

4 Suneel 317113

I need rows 1 and 3 gone, and the new datatable returning only rows 2 and 4. I found a really good related question in the suggestions on SO--this guy. But his solution uses hashtables, and only eliminates row 3, not both 1 and 3. Help!


Solution

  • Okay, so I looked at the blog pointed out to me by Pandiya. In the comments section, a chap called Kevin Morris has posted a solution using a C# Dictionary, which worked for me.

    In my main block, I wrote:

    string keyColumn = "Website";
    RemoveDuplicates(table1, keyColumn);
    

    And my RemoveDuplicates function was defined as:

        private void RemoveDuplicates(DataTable table1, string keyColumn)
    {
        Dictionary<string, string> uniquenessDict = new Dictionary<string, string>(table1.Rows.Count);
        StringBuilder sb = null;
        int rowIndex = 0;
        DataRow row;
        DataRowCollection rows = table1.Rows;
        while (rowIndex < rows.Count - 1)
        {
            row = rows[rowIndex];
            sb = new StringBuilder();
                sb.Append(((string)row[keyColumn]));
            
    
            if (uniquenessDict.ContainsKey(sb.ToString()))
            {
                rows.Remove(row);
                if (RemoveAllDupes)
                {
                    row = rows[rowIndex - 1];
                    rows.Remove(row);
                }
            }
            else
            {
                uniquenessDict.Add(sb.ToString(), string.Empty);
                rowIndex++;
            }
        }
    }
    

    If you go to the blog, you will find a more generic function that allows sniffing dupes over multiple columns. I've added a flag--RemoveAllDupes--in case I want to remove all duplicate rows, but this still assumes that the rows are ordered by name, and involves only duplicates and not triplicates, quadruplicates and so on. If anyone can, please update this code to reflect removal of such.