Search code examples
c#duplicate-dataduplicatesdelete-row

how to delete duplicate rows from ms access database (C#)


I have been going through various sites and codes, but nothing seems to end my misery. Either they help to find and remove duplicates for a specific column or they remove only from the datatable, not the underlying database itself. I want to delete duplicate rows from table "table1" of my mdb file.

To make my requirements clearer:

  1. there's no primary key set for any column in the table (and I cant afford to)
  2. I want to delete all duplicate rows but one! (order has no significance)
  3. I prefer deleting the duplicates from database more than first checking if such a row exist or not before updating database itself (if that's the last resort, which cant be, then that's welcome)
  4. by duplicate rows I mean rows that are not distinct. for eg, in the following example,only 3rd and 5th row are duplicates. And I want to delete any of them.

          Name1  Name2    Name3
          tom    dick   harry
          tom    dick   mike
          ann    sara   mike
          sara   ann    mike
          ann    sara   mike
    

The duplicate rows should be deleted from database with a button click as follows

     private void button1_Click(object sender, EventArgs e)
     {
         deletedupes();
     }

    private void deletedupes()
    {
        OleDbConnection con = new OleDbConnection("PROVIDER=Microsoft.Jet.OLEDB.4.0; Data Source=C:\\hi.mdb");

        DataSet ds = new DataSet();
        OleDbDataAdapter da = new OleDbDataAdapter("select * from table1", con);
        con.Open();
        da.Fill(ds, "table1");

        // what could be rest of the code??
    }

Thanks in advance. Yes I'm a novice..


Solution

  • As none of the answers were satsifactory for me (I'm just a tad too novice to understand the succint and slightly technicalized way spoken here by more knowledgable and experienced people), i tried my own variant to get this done. I could not follow what to be done with commands like distinct or set rowcount or delete from etc. Nowhere I could find a fully deployed code in an example. So I tried this. From scratch.

        int id, k;
        private void button2_Click(object sender, EventArgs e)
        {
            OleDbConnection con = new OleDbConnection("PROVIDER=Microsoft.Jet.OLEDB.4.0; Data Source=C:\\hi.mdb");
    
            DataSet ds = new DataSet();
    
            OleDbDataAdapter da = new OleDbDataAdapter("select * from table2", con);
            con.Open();
            da.Fill(ds, "table2");
    
    
            for (int i = 0; i < ds.Tables["table2"].Rows.Count; i++)
            {
                DataRow row = ds.Tables["table2"].Rows[i];
                k++;
                for (int j = k; j < ds.Tables["table2"].Rows.Count; j++)
                {
                    DataRow row2 = ds.Tables["table2"].Rows[j];
                    if (row.ItemArray.GetValue(1).ToString() == row2.ItemArray.GetValue(1).ToString())
                    {
                        if (row.ItemArray.GetValue(3).ToString() == row2.ItemArray.GetValue(3).ToString())
                        {
                            id = int.Parse(row2.ItemArray.GetValue(0).ToString());
                            deletedupes(id);
                        }
                    }
                }
            }
    
            con.Close();
        }
    
    
        private void deletedupes(int num)
        {
            OleDbConnection con = new OleDbConnection("PROVIDER=Microsoft.Jet.OLEDB.4.0; Data Source=C:\\hi.mdb");
    
            con.Open();
    
            OleDbCommand c = new OleDbCommand("Delete from table2 where id =?", con);
            c.Parameters.AddWithValue("id", num);
            c.ExecuteNonQuery();
    
            con.Close();
        }
    

    Edit: Sorry, I missed to say that I did use a unique column having a primary key to get this done. Nevertheless, this can be done without that as well. Just a matter of choice. And for unknown reasons, this method seems so fast too..