Search code examples
mysqlsqlinsertduplicatesdata-warehouse

SQL- selecting between duplicate rows by comparing field values


I was wondering if anyone can assist me. I already manage to insert only the rows that are not duplicated

NSERT INTO temp_Log  
SELECT dq1.RowID,dq1.Action FROM DQLog dq1
    WHERE dq1.RowID 
    NOT IN (SELECT RowID FROM DQLog dq2
                group by dq2.RowID
                having count(*)>1

how ever i still need to filter the duplicated ones... how can I select a row by a comparing the value between the duplicated rows in the same table?

RowID  Action 
 1     Reject
 1     Allow
 2     Allow
 2     Fix

In this case i will only need to insert the first (reject+allow= Reject) and last row (Allow+Fix = Fix).

Thank you all for your time.


Solution

  • If it's not possible that a Row_ID has Allow as Action more than once, then following query should work:

    INSERT INTO temp_Log  
    (SELECT distinct RowID,Action           
    FROM DQLog 
    Where Action = 'Reject'
    UNION ALL
    SELECT distinct RowID,Action           
    FROM DQLog 
    Where Action = 'Fix'
    And RowID not in (Select distinct RowID from DQLog where Action='Reject')
    )
    ;
    

    EDIT:

    I've mentioned 2 options in my comment. Here it how it goes:

    Option 1:

    Convert Action columns to lower case values.

    Update DQLog set Action = Lower(Action);
    

    The above query will convert all the values of Action in Lower case. Now you can run following query and it's done:

    INSERT INTO temp_Log  
    (SELECT distinct RowID,Action           
    FROM DQLog 
    Where Action = 'reject'
    UNION ALL
    SELECT distinct RowID,Action           
    FROM DQLog 
    Where Action = 'fix'
    And RowID not in (Select distinct RowID from DQLog where Action='reject')
    )
    ;
    

    Hence your task is done and your redundant data is fixed too.

    Option 2:
    

    Just add the all possible variation of Fix and Reject in where clause of my main query as follows:

    INSERT INTO temp_Log  
    (SELECT distinct RowID,Action           
    FROM DQLog 
    Where Action = 'Reject' or Action='reject'
    UNION ALL
    SELECT distinct RowID,Action           
    FROM DQLog 
    Where Action = 'Fix' or Action = 'fix'
    And RowID not in (Select distinct RowID from DQLog where Action='Reject'or Action='reject')
    )
    ;
    

    I will advice you to go for Option 1 !

    Hope it helps!