Search code examples
sqloracle-databasesql-updatedatabase-partitioning

Which field should I use with Oracle Partition By clause to improve performance


I have an update statement that works fine but takes a very long time to complete.

I'm updating roughly 150 rows in one table with some tens of thousands of rows exposed through a view. It's been suggested that I use the Partition By clause to speed up the process.

I'm not too familiar with Partition By statement but I've been looking around and I think maybe I need to use a field that has a numeric value that can be compared against.

Is this correct? Or can I partition the larger table with something else?

if that is the case I'm struggling with what in the larger table can be used. The table is composed as follows.

ID has a type of NUMBER and creates the unique id for a particular item. Start_Date has a date type and indicates the start when the ID is valid. End date has a date type and indicates the end time when the ID cease to be valid. ID_Type is NVARCHAR2(30) and indicates what type of Identifier we are using. ID_Type2 is NVARCHAR2(30) and indicates what sub_type of Identifier we are using. Identifier is NVARCHAR2(30) and any one ID can be mapped to one or more Identifiers.

So for example - View_ID

ID | Start_Date | End_Date   | ID_Type1| ID_Type2 | Identifier
1  | 2012-01-01 | NULL       | Primary | Tertiary | xyz1
1  | 2012-01-01 | NULL       | Second  | Alpha    | abc2
2  | 2012-01-01 | 2012-01-31 | Primary | Tertiary | ghv2
2  | 2012-02-01 | NULL       | Second  | Alpha    | mno4

Would it be possible to Partition By the ID field of this view as long as there is a clause that the id is valid by date?

The update statement is quite basic although it selects against one of several possible identifiers and and ID_Type1's.

UPDATE Temp_Table t set ID = 
(SELECT DISTINCT ID FROM View_ID v
 WHERE inDate BETWEEN Start_Date and End_Date
 AND v.Identifier = (NVL(t.ID1, NVL(t.ID2, t.ID3)))
 AND v.ID_Type1 in ('Primary','Secondary'));

Thanks in advance for any advice on any aspect of my question.

Additional Info ***

After investigating and following Gordon's advice I changed the update to three updates. This reduced the overall update process 75% going from just over a minute to just over 20 seconds. Thats a big improvement but I'd like to reduce the process even more if possible.

Does anyone think that Partition By clause would help even further? If so what would be the correct method for putting this clause into an update statement. I'm honestly not sure if I understand how this clause operates.

If the UPDATE using a SELECT statement only allows for 1 value to be selected does this exclude something like the following from working?

UPDATE Temp_Table t SET t.ID = 
(SELECT DISTINCT ID,
        Row_Number () (OVER PARTITION BY ID_Type1) AS PT1
 FROM View_ID v
 WHERE inDate BETWEEN v.Start_Date and v.End_Date
 AND v.Identifier = t.ID1
 AND PT1.Row_Number = 1 )

*Solution************

I combined advice from both Responders below to dramatically improve performance. From Gordon I removed the NVL from my UPDATE and changed it to three separate updates. (I'd prefer to combine them into a case but my trials were still slow.)

From Eggi, I looked working with some kind of Materialized view that I can actually index myself and settled on a WITH Clause.

UPDATE Temp_Table t set ID = 
(WITH IDs AS (SELECT /*+ materialize */ DISTINCT ID, Identifier FROM View_ID v
 WHERE inDate BETWEEN Start_Date and End_Date
 AND v.Identifier = ID1)
 SELECT g.ID FROM IDs g
 WHERE g.Identifier = t.ID1;

Thanks again.


Solution

  • It is very hard to imagine how windows/analytic functions would help with this update. I do highly recommend that you learn them, but not for this purpose.

    Perhaps the suggestion was for partitioning the table space, used for the table. Note that this is very different from the "partition by" statement, which usually refers to window/analytic functions. Tablespace partitioning might help performance. However, here is something else you can try.

    I think your problem is the join between the temp table and the view. Presumably, you are creating the temporary table. You should add in a new column, say UsedID, with the definition:

    coalesce(t.ID1, t.ID2, t.ID3) as UsedId
    

    The "WHERE" clause in the update would then be:

    WHERE inDate BETWEEN Start_Date and End_Date AND
          v.Identifier = t.UsedId AND
          v.ID_Type1 in ('Primary', 'Secondary')
    

    I suspect that the performance problem is the use of NVL in the join, which interferes with optimization strategies.

    In response to your comment . . . your original query would have the same problem as this version. Perhaps the logic you want is:

    WHERE inDate BETWEEN Start_Date and End_Date AND
          v.Identifier in (t.ID1, t.ID2, t.ID3) AND
          v.ID_Type1 in ('Primary', 'Secondary')