Search code examples
sqlsql-servergroup-byduplicatesouter-join

How to delete duplicates if all the their linked values in another table (foreign key linked rows) are also the same


I have three tables:


x (column1,..., x_id)


y (column1, column2, colum3, ..., x_id, y_id)


z (column1, column2, colum3,..., x_id, y_id, z_id)

My final goal: delete duplicates in y table if all of their linked values (based on foreign keys) in z table are also exactly the same. For example, This is just an example.

For more clarification consider this example: y table: my table structure is very similar to this and this table is close to the reality of what I have.

column 1 column 2 x_id y_id
foo bar 1 1
foo bar 1 2
xx yy 2 3
zz kk 2 4
tt mm 2 5
baz qux 3 6
baz qux 3 7

has two groups with the same column 1 and column 2 and x_id values (the y_ids = (1 and 2) and y_id == (6,7).

Then for these rows, I want to check a second table to see if all the items are the same. meaning they have the same items in the z table. The items of the y_id = 1 and y_id = 2 are the same, so one of them in the y table should be deleted.

{color:red;} column 1 column 2 column x_id y_id z_id
foo bar qux 1 1 1
foo2 bar2 qux2 1 1 2
foo bar qux 1 2 3
foo2 bar2 qux2 1 2 4
baz foo qux 3 6 5
baz2 foo2 qux1 3 6 6
baz3 foo3 qux2 3 6 7
baz foo qux 3 7 8
baz2 foo2 qux1 3 7 9

I want to delete the row with y_id = 1 from the y table ( or y_id = 2, which duplicate row doesn't make a difference). The tables are very big so a fast solution would be great. I know I can join the tables, but I was hoping I could find another solution. A solution that can use the group by IDs instead and just look for the found items there inside the z table (third table)


Solution

  • I believe the below example will help me to delete the rows I want:

    WITH cte AS (
        SELECT
            a.column1 AS a_column1,
            a.column2 AS a_column2,
            a.y_id AS a_y_id,
            a.x_id AS a_x_id,
            b.column1 AS a_column1,
            b.colum2 AS a_column2,
            b.y_id AS b_y_id ,
            b.x_id AS b_x_id 
        FROM YTable a
        JOIN YTable b
        ON a.column1 = b.column1 
            AND a.column2 = b.column2 
            AND a.x_id = b.x_id 
            AND a.y_id != b.y_id
        WHERE a.y_id < b.y_id
    )
    SELECT
        cte.*,
        CASE
            WHEN EXISTS (
                SELECT 1
                FROM (
                    SELECT * FROM ZTable
                    WHERE y_id = cte.y_id
                ) table1
                FULL JOIN (
                    SELECT * FROM ZTable
                    WHERE y_id = cte.y_id
                ) table2
                ON table1.colum1 = table2.column1
                    AND table1.colum2 = table2.colum2
                    AND table1.colum3 = table2.colum3
                WHERE table1.column1 IS NULL OR table2.column1 IS NULL
            )
            THEN 'Not Identical'
            ELSE 'Identical'
        END AS duplicated
    FROM cte;