I have the following table structure:
table_a
id | customer_id | product_id
---+-------------+------
1 | c1 | p1
2 | c1 | p1
3 | c2 | p1
table_b
id | table_a_id | attribute
---+-------------+------
99 | 1 | a1
98 | 2 | a2
97 | 3 | a3
As you can see table_a
have duplicates values, and I want to merge them.
Unfortunatly the table_a
PK
is also used on table_b
.
The final resoult should be:
table_a
id | customer_id | product_id
---+-------------+------
1 | c1 | p1
3 | c2 | p1
table_b
id | table_a_id | attribute
---+-------------+------
99 | 1 | a1
98 | 1 | a2
97 | 3 | a3
I have to update the table_b
relation with table_a
, and than clear all the unsed keys on table_a
.
Unfortunatly the only query I've thought of is really heavy and the DB timeout before can be completed. table_a
have 200k+ records and table_b
is at least twice that.
My idea was:
table_a
with table_b
, to get: (table_b_id, table_a_customer_id, table_a_product_id)
table_a
. (to get the right id
of table_a
I just used min("id")
table_b
.Here is one option using common table expressions:
with
ta as (
select ta.*, min(id) over(partition by customer_id, product_id) min_id
from table_a ta
),
upd as (
update table_b tb
set table_a_id = ta.min_id
from ta
where tb.table_a_id = ta.id and ta.id <> ta.min_id
)
delete from table_a ta1
using ta
where
ta1.customer_id = ta.customer_id
and ta1.product_id = ta.product_id
and ta1.id > ta.id
The first CTE associates the target id
to each row of table_a
. Then, we use that information to update table_b
. Finally, we delete duplicate rows in table_a
, retaining the earliest id
only.