Search code examples
sqloracle-databaseinsertunion

Oracle SQL delete, insert records, keeping fixed quantity of rows ordered by date


As the title says, I'm trying to delete & insert records to an Historical Table.

The Historical tbl_b has to receive records from a tbl_a, filtered by the column val_id; but there is a limit of records per each val_id, also it must keep only the most recent ones per val_id.

TBL_A
  • It has columns id_val, reg_date, flag
  • it has up to date records (constantly, records are inserted on this table).
  • only records with flag=1 should be inserted on TBL_B.
  • records are deleted by another scheduled process.
val_id reg_date flag
33 2022-10-20 23:00:00 1
22 2022-10-20 22:00:01 0
22 2022-10-20 22:00:02 1
11 2022-10-20 21:00:01 1
11 2022-10-20 21:00:02 1
11 2022-10-20 21:00:03 1
TBL_B:
  • It has columns id_val, reg_date
  • it should store 2 records per id_val and the most recent ones (order by reg_date).
  • it's partitioned monthly, this table will store 150 Million records aprox.
val_id reg_date
11 2022-10-19 11:00:01
11 2022-10-19 11:00:02
22 2022-10-19 12:00:01
22 2022-10-19 12:00:02
Desired Result on TBL_B:
val_id reg_date
11 2022-10-20 21:00:02
11 2022-10-20 21:00:03
22 2022-10-19 12:00:02
22 2022-10-20 22:00:02
33 2022-10-20 23:00:00

To approach this, I'm trying to do it in 2 steps:

  1. first delete records from TBL_B, if it's necessary.
  2. then insert records from TBL_A, since they are always the most recent ones.

But at this moment I'm stuck trying to filter the records that should be deleted from TBL_B.

Link: http://sqlfiddle.com/#!4/73271c/1

DELETE FROM tbl_b where rowid in (
SELECT
  rowid
FROM (SELECT
  m.*,
  ROW_NUMBER() OVER (
  PARTITION BY id_val
  ORDER BY
  reg_date DESC
  ) AS rownumb
FROM (SELECT
  h.*
FROM tbl_b h
LEFT JOIN (SELECT
  *
FROM (SELECT
  tbl_a.*,
  ROW_NUMBER() OVER (
  PARTITION BY id_val
  ORDER BY
  reg_date DESC
  ) AS seqnum
FROM tbl_a
WHERE flag = 1) f
WHERE f.seqnum <= 2) t
  ON t.id_val = h.id_val) m) n
WHERE n.rownumb > 2
);

Any help is appreciated.


Solution

  • Actually, I'd rather do it just the opposite of what you wanted: I'd insert rows first, and delete superfluous rows next. Why? Because - if you're deleting rows first, you have to calculate how many rows to leave in TBL_B so that you'd insert the right number of rows from TBL_A so that TBL_B always contains up to 2 most recent rows. That's just too complex.

    Also, indexes you currently have on these tables are kind of wrong; I'd index id_val and reg_date in both tables; indexing flag (at the first sight, unless there are many flags, not just e.g. 0 and 1) won't help much.

    OK, here's my suggestion: initial TBL_B table contents:

    SQL> select * from tbl_b order by id_val, reg_date;
    
    ID_VAL REG_DATE
    ------ -------------------
    11     2022-10-19 11:00:01
    11     2022-10-19 11:00:02
    22     2022-10-19 12:00:01
    22     2022-10-19 12:00:02
    
    SQL>
    

    Insert first: fetch two most recent rows per id_val - that's the maximum rows you'd have if there are no rows in TBL_B at all:

    SQL>     insert into tbl_b (id_val, reg_date)
      2        with temp as
      3          (select id_val,
      4                  reg_date,
      5                  rank() over (partition by id_val order by reg_date desc) rnk
      6           from tbl_a
      7           where flag = 1
      8          )
      9        select t.id_val, t.reg_date
     10        from temp t
     11        where t.rnk <= 2;
    
    4 rows created.
    

    Delete superfluous rows:

    SQL> delete from tbl_b a
      2  where exists
      3         (select null
      4          from (select b.id_val,
      5                       b.reg_date,
      6                       rank() over (partition by b.id_val order by b.reg_date desc) rnk
      7                from tbl_b b
      8               ) x
      9          where x.id_val = a.id_val
     10            and x.reg_date = a.reg_date
     11            and x.rnk > 2
     12         );
    
    3 rows deleted.
    

    The final result:

    SQL> select * from tbl_b order by id_val, reg_date;
    
    ID_VAL REG_DATE
    ------ -------------------
    11     2022-10-20 11:00:02
    11     2022-10-20 11:00:03
    22     2022-10-19 12:00:02
    22     2022-10-20 22:00:02
    33     2022-10-20 23:00:00
    
    SQL>