Search code examples
sqloracleplsqlbulkupdate

Should I clear collections every fetch loop to be used in forall? PLSQL Oracle


Lets say I have a huge data(100k-1m rows) in table1 that I need to do some checking then update their status on table2 based on the filtered results. As you can see with my simplified code. I bulk collect 1000 rows per batch and filter them into 3 different collections (dsa1, dsa2, dsa3) then later forall update this 3 collections into table2.

My problem here is that lets say the first fetch gets 100 rows into dsa1, then the second fetch only gets 70 rows into dsa1. When the forall update runs it also updates the old 30 rows in dsa1 from the first fetch.

2 Solutions I think of is first, is delete all elements in collects every fetch loop. and second is to put the forall outside the fetch loop which will make the 3 collections very large but forall is only called once.

The second solution will take up a lot of memory right? please advice what is the best solutions

declare

cursor c1 is 
select t1.id, t1.status, t2.con from table1 t1, table2 t2
where t1.id = t2.id;

type ty_c1 is table of c1%rowtype;
asd1 ty_c1 := ty_c1();

type ty_id is table of c1.id%type index by pls_integer;
dsa1 ty_id;
dsa2 ty_id;
dsa3 ty_id;
begin
    open c1;
    loop
        fetch c1 bulk collect into asd1 limit 1000;
        exit when asd1.count = 0;

        for i in 1 .. asd1.count
        loop
            if (asd1(i).status = 'ACT') then
                dsa1(i).id := asd1(i).id;
            elsif (asd1(i).status = 'NOT ACT') then
                dsa2(i).id := asd1(i).id;
            else
                dsa3(i).id := asd1(i).id;
            end if;
        end loop;

        forall idx in indices of dsa1
            update table2 set con = 'ACTIVE'
            where id = dsa1(idx).id;

        forall idx in indices of dsa2
            update table2 set con = 'NOT ACTIVE'
            where id = dsa2(idx).id;

        forall idx in indices of dsa3
            update table2 set con = 'DEAD'
            where id = dsa3(idx).id;


    end loop;
    close c1;
end;

Solution

  • From a performance perspective, building the three collections and hitting the database only once will be more performant at the expense of using more memory.

    So answer depends on the typical volume you will be processing and available memory.

    You can also have a fourth collection which remains empty and assign the empty collection to dsa1,dsa2 and dsa3 within the loop rather than deleting the entries.

    But looking at your code, since only the status is changing, why not create a collection based on a record of item and status or have a second collection to hold status and then regardless of status, you would be updating 1000 records at a time with a single collection.

    And a step further, since 1M records is nothing for an oracle database, (billions of records are huge, a million records is trivial) just use the database to do a single update select statement

    update table2 t2
    set conn=(select decode(t1.status, 'ACT', 'ACTIVE', 'NOT ACT','NON ACTIVE','DEAD') 
    from table1 t1)
    where t2.id=t1.id
    

    NB rows in t2 that dont exist in t1 will have conn set to null but you can restrict where clauses to limit impact if this is not desired.

    you could also add a parallel hint on both the update and the select to use more CPUs and if you are able to modify the data model, using a numeric code instead of a varchar to represent the status would also be more efficient.