Search code examples
sqlsql-serverduplicatespartition

Removing duplicates rows based off one column SQL


I need to clear some partial duplicate rows. In this case, the columns InvoiceTaskID have a few duplicate values and I want to clear them.

I have tried partitioning based of some of the solutions here but I keep getting row number = 1 for every single row when I execute.

I don't have a primary key to work with

Note: the column dbo.D.duplicate is a count of how many times the duplicates in appear in the column InvoiceTaskID

Does anyone have a solutions?

 select    dbo.A.TaskId,
           dbo.A.InvoiceTaskId,
           dbo.A.Client,
           dbo.Acd.InvoiceId,
           dbo.A.Date,
           dbo.A.JobId,
           dbo.A.TaskName,
           dbo.A.StaffName,
           dbo.A.BillableRate,
           dbo.A.Hrs,
           dbo.A.Amount,
           dbo.D.duplicate,
    
           ROW_NUMBER() over (partition by dbo.A.TaskId,
           dbo.A.InvoiceTaskId,
           dbo.A.Client,
           dbo.A.InvoiceId,
           dbo.A.Date,
           dbo.A.JobId,
           dbo.A.TaskName,
           dbo.A.StaffName,
           dbo.A.BillableRate,
           dbo.A.Hrs,
           dbo.A.Amount,
           dbo.D.duplicate
    order by dbo.A.TaskId,
           dbo.A.InvoiceTaskId,
           dbo.A.Client,
           dbo.A.InvoiceId,
           dbo.A.Date,
           dbo.A.JobId,
           dbo.A.TaskName,
           dbo.A.StaffName,
           dbo.A.BillableRate,
           dbo.A.Hrs,
           dbo.A.Amount,
           dbo.D.duplicate) RowNumber
    from dbo.A
    INNER JOIN dbo.D ON dbo.D.TaskNo = dbo.A.InvoiceTaskId

Solution

  • You can delete the records you obtained using WITH your query.

    WITH RECORDS AS (
    select    dbo.A.TaskId,
               dbo.A.InvoiceTaskId,
               dbo.A.Client,
               dbo.Acd.InvoiceId,
               dbo.A.Date,
               dbo.A.JobId,
               dbo.A.TaskName,
               dbo.A.StaffName,
               dbo.A.BillableRate,
               dbo.A.Hrs,
               dbo.A.Amount,
               dbo.D.duplicate,
        
               ROW_NUMBER() over (partition by dbo.A.TaskId,
               dbo.A.InvoiceTaskId,
               dbo.A.Client,
               dbo.A.InvoiceId,
               dbo.A.Date,
               dbo.A.JobId,
               dbo.A.TaskName,
               dbo.A.StaffName,
               dbo.A.BillableRate,
               dbo.A.Hrs,
               dbo.A.Amount,
               dbo.D.duplicate
        order by dbo.A.TaskId,
               dbo.A.InvoiceTaskId,
               dbo.A.Client,
               dbo.A.InvoiceId,
               dbo.A.Date,
               dbo.A.JobId,
               dbo.A.TaskName,
               dbo.A.StaffName,
               dbo.A.BillableRate,
               dbo.A.Hrs,
               dbo.A.Amount,
               dbo.D.duplicate) RowNumber
        from dbo.A
        INNER JOIN dbo.D ON dbo.D.TaskNo = dbo.A.InvoiceTaskId
    ) DELETE FROM RECORDS WHERE RowNumber > 1