I have a table without a primary key where more than 13,000 records are duplicates on the OBJID column, but the values in the other columns are different. I want to merge both duplicate records into a single one. Is it possible to do this with a merge statement or do I need to update the table with a loop and fix it manually?
OBJID,INVKLASSE,NETTTYPE,NAVN,ALT_NAVN,ON_ID,ON_TYPE,ON_STED_NR,ON_OVER_ID -219468,NYINVESTERING,,,,,,, -219468,,,,,277615,50,277614,-99 -218981,NYINVESTERING,,,,,,, -218981,,,,,277335,50,277334,-99 -218761,NYINVESTERING,,,,,,, -218761,,,,,277909,50,277908,-99 -218709,NYINVESTERING,,,,,,, -218709,,,,,284646,50,284645,-99
I have created a temporary table and I am trying to insert all duplicate records where OBJID is the primary key.
select * from (
select
objid,INVKLASSE,NETTTYPE,NAVN,ALT_NAVN,ON_ID,ON_TYPE,ON_STED_NR,ON_OVER_ID,
row_number() over(partition by objid order by objid) as DUP
from USEROPENNIS
)
where dup=2;
I have created some SQL like this to separate all duplicate records where DUP=1 or 2. All DUP=1 can be inserted as unique records, but DUP=2 is 13,000 records where I need to update the table where all columns are null or something similar.
I'm looking for suggestions on anything that can quickly fix it.
To collapse duplicate records into one while picking column values from each, you will need to use a GROUP BY
on your primary key combined with aggregate functions (like MAX
) around all your non-key columns, with possible DECODE
/CASE
inside each aggregate that try to decide which values to prioritize.
If your only issue is some rows have NULL
for all the columns and other rows have data, then delete the rows with NULL
values. Or if it's mixed but still only a NULL
vs. non-NULL
issue, MAX
by itself will pick out the non-NULL
value. If it's more complicated than this and you have multiple non-NULL
values to choose from, you have to write the logic to pick the one you want.
At its simplest:
SELECT objid,
MAX(invklasse),MAX(nettype),MAX(navn), etc...
FROM USEROPENNIS
GROUP BY objid
More complex (this can look like anything, this just gives the general idea):
SELECT objid,
MAX(NULLIF(invklasse,'BADVALUE')),
SUBSTR(MIN(DECODE(nettype,'nicevalue','1','lessnicevalue','2','3')||nettype),2) nettype, etc..
FROM USEROPENNIS
GROUP BY objid
Once you get the data the way you want it in your SELECT
query, you place the query you come up with in a CTAS to create a temp table:
CREATE TABLE tmp$1 AS SELECT ...
Now truncate the original table and insert into it the contents of your temporary table:
TRUNCATE TABLE useropennis;
INSERT /*+ append */ INTO useropennis SELECT * FROM tmp$1;
COMMIT;
Now create a PK on it so you don't get into this situation again:
ALTER TABLE useropennis ADD CONSTRAINT pk_useropennis PRIMARY KEY (objid);