Search code examples
c#.netlinqdatatableiequalitycomparer

Remove duplicates from DataTable and custom IEqualityComparer<DataRow>


How have I to implement IEqualityComparer<DataRow> to remove duplicates rows from a DataTable with next structure:

ID primary key, col_1, col_2, col_3, col_4

The default comparer doesn't work because each row has it's own, unique primary key.

How to implement IEqualityComparer<DataRow> that will skip primary key and compare only data remained.

I have something like this:

public class DataRowComparer : IEqualityComparer<DataRow>
{
 public bool Equals(DataRow x, DataRow y)
 {
  return
   x.ItemArray.Except(new object[] { x[x.Table.PrimaryKey[0].ColumnName] }) ==
   y.ItemArray.Except(new object[] { y[y.Table.PrimaryKey[0].ColumnName] });
 }

 public int GetHashCode(DataRow obj)
 {
  return obj.ToString().GetHashCode();
 }
}

and

public static DataTable RemoveDuplicates(this DataTable table)
{
  return
    (table.Rows.Count > 0) ?
  table.AsEnumerable().Distinct(new DataRowComparer()).CopyToDataTable() :
  table;
}

but it calls only GetHashCode() and doesn't call Equals()


Solution

  • That is the way Distinct works. Intenally it uses the GetHashCode method. You can write the GetHashCode to do what you need. Something like

    public int GetHashCode(DataRow obj)
    {
        var values = obj.ItemArray.Except(new object[] { obj[obj.Table.PrimaryKey[0].ColumnName] });
        int hash = 0;
        foreach (var value in values)
        {
            hash = (hash * 397) ^ value.GetHashCode();
        }
        return hash;
    }
    

    Since you know your data better you can probably come up with a better way to generate the hash.