Search code examples
sqlpostgresqlinsertsql-insert

SQL - insert rows preserving overall number of duplicate values


I have a table, say, Product (Id, Name):

Id Name
1 'one'
2 'two'
3 'three'

I have a new version of data that is needed to update the table, with possible duplicate values.

For example:

Name
'one'
'two'
'two'
'two'
'four'

I need the result table to consist of as many rows of each "Name" as there are in the second table, i.e.

Id Name
1 'one'
2 'two'
3 'three'
4 'two'
5 'two'
6 'four'

So, if the value, like 'one' already exists, we don't need to insert a duplicate, but if there are more rows of the same value in the second table, like 'two' or 'four', we need to insert respectively 2 and 1 more values into the table.

How do I do it using SQL?

I've tried answers that suggest inserting into the table without duplicates, which is not my task.


Solution

  • First I set up som tables for the testing:

    create temp table Product(id  integer primary key generated always as identity, name text);
    insert into Product (name) values ('one'),('two'),('three');
    select * from Product;
    
    create  temp table newdata(name text);
    insert into newdata values ('one'),('two'),('two'),('two'),('four');
    select * from newdata;
    

    Then I use the analytical function "row_number()" which creates row numbers on the result. That is just a way of counting how many instances of each name is in your list

    select name,row_number() over (partition by name order by name) rn from newdata;
    
    
    +------+----+
    | name | rn |
    +------+----+
    | four |  1 |
    | one  |  1 |
    | two  |  1 |
    | two  |  2 |
    | two  |  3 |
    +------+----+
    

    For product:

    select name,row_number() over (partition by name order by name) rn from product
    
    
    +-------+----+
    | name  | rn |
    +-------+----+
    | one   |  1 |
    | three |  1 |
    | two   |  1 |
    +-------+----+
    

    Finding the differences between the two tells me which names I am missing:

    select name,row_number() over (partition by name order by name) rn from newdata
    except
    select name,row_number() over (partition by name order by name) rn from product;
    
    
    +------+----+
    | name | rn |
    +------+----+
    | two  |  3 |
    | four |  1 |
    | two  |  2 |
    +------+----+
    

    Then I just need to insert the missing names:

    insert into product (name)
    select name from (
    select name,row_number() over (partition by name order by name) rn from newdata
    except
    select name,row_number() over (partition by name order by name) rn from product
    ) a;
    select * from product
    
    
    +----+-------+
    | id | name  |
    +----+-------+
    |  1 | one   |
    |  2 | two   |
    |  3 | three |
    |  4 | two   |
    |  5 | four  |
    |  6 | two   |
    +----+-------+