Search code examples
oracleoracle11g

How to Update/insert records based on WHERE clause in Oracle


I'm looking for query to upsert(update/insert) into Oracle table from pandas dataframe. After searching I got this merge statement which updates the values when the key is present else it will insert.


merge into table_A using dual on ( key_column = '123' )
when matched then 
update set date_column = TO_TIMESTAMP('2021-01-05 00:00:11.644', 'YYYY-MM-DD HH24:MI:SS.FF3'), OR = 'MZM'
when NOT matched then 
insert (key_column, date_column, OR) values( '456',TO_TIMESTAMP('2021-04-05 00:00:11.644', 'YYYY-MM-DD HH24:MI:SS.FF3'),'MZM')

Solution I need

for update

I want to update record based on the condition max(date_column) for that key which should be less than or equal to 30 days from the record date_column I'm inserting.

for insert

I want to insert a new row for a key which is already present but the last available record .i.e date_column for that key should not be within 30 days.

What I tried

merge into table_A using dual on ( key_column = '123' )
when matched then 
update set date_column = TO_TIMESTAMP('2021-12-31 00:00:11.644', 'YYYY-MM-DD HH24:MI:SS.FF3'), ORA = 'MZM'
where (select trunc(TO_TIMESTAMP('2021-01-05 00:00:11.644', 'YYYY-MM-DD HH24:MI:SS.FF3')) - trunc(max(date_column))days from table_A where key = '123') <= 30 
when NOT matched then 
insert (key_column, date_column, ORA) values( '123',TO_TIMESTAMP('2021-12-31 00:00:11.644', 'YYYY-MM-DD HH24:MI:SS.FF3'),'MZM')
where (select trunc(TO_TIMESTAMP('2022-12-31 00:00:11.644', 'YYYY-MM-DD HH24:MI:SS.FF3')) - trunc(max(date_column))days from table_A where key = '123') > 30  ;

But it is not working :(.

I cannot do this from pandas because the data size is huge.

Table recreation script

CREATE TABLE "FDSMLDBUSER"."FDS_upsert" ("key" VARCHAR2(255 BYTE) NOT NULL ENABLE, "date_column" TIMESTAMP (6) NOT NULL ENABLE, "ORA" VARCHAR2(100 BYTE))


insert into "FDSMLDBUSER"."FDS_upsert" ("key", "date_column", ora) values ('123',TO_TIMESTAMP('2021-12-31 00:00:11.644', 'YYYY-MM-DD HH24:MI:SS.FF3'),'MZM')

Can I do this using upsert keyword in oracle where the statement updates when WHERE clause is True or it will insert ?

I appreciate your help so much.

General Question:

Can there be more than one row for a key in the past 30 days in the table? No, we should only have one record per key for past 30 days and that should be the latest one. if any key is older than 30 days then we need to insert new row for that key.

Can there be future dates in the table? Fortunately No.

I use executemany method of cx_oracle to insert records to database where I convert pandas dataframe to dictionary to enable column name inside it. Please find the code below:

connection = cx_Oracle.connect(f'{db_user}/{db_pwd}@{db_host}:{dbport}/{db_service_name}',encoding='UTF-16', nencoding='UTF-16')
cursor = connection.cursor()
parameters = secondary.to_dict(orient='records')
query = config.get('FDS_APAMA_RAWDATA', 'fds_raw_data')
cursor.prepare(query)
cursor.executemany(None,parameters)
for row in cursor:
    print(row[0])
con.commit() 

Solution

  • You can LEFT OUTER JOIN the new data to the existing data and find whether a row exists within 30 days and then, if it does, use the ROW_NUMBER analytic function to find the latest matching row and correlate the update using the ROWID pseudo-column:

    MERGE INTO table_A dst
    USING (
      SELECT d.*,
             a.ROWID AS rid,
             ROW_NUMBER() OVER (ORDER BY a.date_column DESC NULLS LAST) AS rn
      FROM   ( SELECT '123' AS key_column, 
                      TIMESTAMP '2023-01-05 00:00:11.644' AS date_column,
                      'MZM' AS ora
               FROM   DUAL ) d
             LEFT OUTER JOIN table_A a
             ON (    a.key_column = d.key_column
                 AND a.date_column BETWEEN d.date_column - INTERVAL '30' DAY
                                       AND d.date_column + INTERVAL '30' DAY
             )
    ) src
    ON ( src.rid = dst.ROWID AND src.rn = 1)
    WHEN MATCHED THEN
    UPDATE
      SET date_column = src.date_column,
          ora         = src.ora
    WHEN NOT MATCHED THEN
      INSERT (key_column, date_column, ora)
      VALUES (src.key_column, src.date_column, src.ora);
    

    Which, for the sample data:

    CREATE TABLE table_a (
      key_column VARCHAR2(255 BYTE) NOT NULL ENABLE,
      date_column TIMESTAMP (6) NOT NULL ENABLE,
      ORA VARCHAR2(100 BYTE)
    );
    
    INSERT INTO table_a (key_column, date_column, ora)
    values ('123', TIMESTAMP '2022-12-31 00:00:11.644', 'MZM');
    

    The the row updates to:

    KEY_COLUMN DATE_COLUMN ORA
    123 2023-01-05 00:00:11.644000 MZM

    db<>fiddle here