Search code examples
oracledatabase-performance

Oracle 11gR2 - View Function Columns Evaluation


I seem to have an odd issue regarding an Oracle view that has functions defined for columns and when those functions are evaluated.

Let's say I have the following view and function definition:

CREATE OR REPLACE VIEW test_view_one AS
SELECT column_one,
       a_package.function_that_returns_a_value(column_one) function_column
FROM   a_table;

CREATE OR REPLACE PACKAGE BODY a_package AS 
    FUNCTION function_that_returns_a_value(p_key  VARCHAR2) RETURN VARCHAR2 IS
       CURSOR a_cur IS
          SELECT value
            FROM table_b
           WHERE key = p_key;
      p_temp   VARCHAR2(30);
    BEGIN
    -- Code here to write into a temp table. The function call is autonomous.
      OPEN a_cur;
      FETCH a_cur INTO p_temp;
      CLOSE a_cur;

      RETURN p_temp;
    END function_that_returns_a_value;
END a_package;

In general, I would expect that if function_column is included in a query then for every row brought back by that query, the function would be run. This seems to be true in some circumstances but not for others.

For example, let's say I have the following:

SELECT pageouter,* 
FROM(WITH page_query AS (SELECT * 
                           FROM test_view_one
                         ORDER BY column_one)
SELECT page_query.*, ROWNUM as innerrownum
FROM page_query
WHERE rownum <= 25) pageouter WHERE pageouter.innerrownum >= 1

In this scenario, that inner query (the one querying test_view_one) brings back around 90,000 records. If I define the function as inserting into a temporary table then I can tell that the function ran 25 times, once for each row brought back. Exactly what I would expect.

However, if I add a significant where clause on to that inner query, e.g.

SELECT pageouter,* 
  FROM(WITH page_query AS (SELECT * 
                             FROM test_view_one
                            WHERE EXISTS (SELECT 'x' FROM some_table WHERE ...)
                            AND NOT EXISTS (SELECT 'x' FROM some_other_table WHERE ...)
                            AND EXISTS (SELECT 'x' FROM another_table WHERE ...)
                           ORDER BY column_one)
  SELECT page_query.*, ROWNUM as innerrownum
  FROM page_query
  WHERE rownum <= 25) pageouter WHERE pageouter.innerrownum >= 1

Then the number of rows being brought back by the inner query is 60,000 and if I then query the temporary table, I can tell the function has run 60,000 times. Unsurprisingly, this pretty much destroys performance of the query.

The queries above are run as part of a paging implementation which is why we only ever bring back 25 rows and is why we only ever need the functions to be run for those 25 rows.

I should add, if I change the WHERE clause (i.e. I remove some of the conditions) then the query goes back to behaving it self, only running the functions for the 25 rows that are actually brought back.

Does anyone have any idea as to when functions in views are evaluated? Or anyway in determining what causes it or a way of identifying when the functions are evaluated (I've checked the explain plan and there's nothing in there which seems to give it away). If I knew that then I could hopefully find a solution to the problem but there seems to be little documentation other than "They'll run for each row brought back" which is clearly not the case in some scenarios.

I fully appreciate it's difficult to work out what's going on without a working schema but if you need anymore info then please feel free to ask.

Many Thanks


Additional Info as Requested.

Below is the actual explain plan that I get out of the production environment. The table names don't match the above query (in fact there's considerably more tables involved but they're all joined by NOT EXISTS statements within the WHERE clause.) The DEMISE table, is the equivalent of the A_TABLE in the above query.

It's worth noting that stats were gathered just before I ran the explain plan to make it as accurate as possible.

My understanding of this is that the VIEW row is where the functions would be evaluated, which occurs AFTER the rows have been filtered down. My understanding is obviously flawed!

So this is the bad plan, the one that calls the function 60,000 times...

Execution Plan
----------------------------------------------------------

-------------------------------------------------------------------------------------------
| Id  | Operation                              | Name        | Rows  | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |             |     5 | 10230 |   984   (1)|
|   1 |  FAST DUAL                             |             |     1 |       |     2   (0)|
|   2 |  FAST DUAL                             |             |     1 |       |     2   (0)|
|*  3 |  VIEW                                  |             |     5 | 10230 |   984   (1)|
|*  4 |   COUNT STOPKEY                        |             |       |       |            |
|   5 |    VIEW                                |             |     5 | 10165 |   984   (1)|
|*  6 |     SORT ORDER BY STOPKEY              |             |     5 |   340 |   984   (1)|
|   7 |      COUNT                             |             |       |       |            |
|*  8 |       FILTER                           |             |       |       |            |
|*  9 |        HASH JOIN RIGHT OUTER           |             |  5666 |   376K|   767   (1)|
|* 10 |         INDEX RANGE SCAN               | USERDATAI1  |     1 |    12 |     2   (0)|
|* 11 |         HASH JOIN RIGHT ANTI           |             |  5666 |   309K|   765   (1)|
|* 12 |          INDEX FAST FULL SCAN          | TNNTMVINI1  |     1 |    17 |    35   (0)|
|* 13 |          HASH JOIN RIGHT ANTI          |             |  6204 |   236K|   729   (1)|
|* 14 |           INDEX RANGE SCAN             | CODESGENI3  |     1 |    10 |     2   (0)|
|* 15 |           INDEX FULL SCAN              | DEMISEI4    |  6514 |   184K|   727   (1)|
|  16 |            NESTED LOOPS                |             |     1 |    25 |     3   (0)|
|  17 |             NESTED LOOPS               |             |     1 |    25 |     3   (0)|
|* 18 |              INDEX RANGE SCAN          | PROPERTY_GC |     1 |    15 |     2   (0)|
|* 19 |              INDEX UNIQUE SCAN         | CODESGENI1  |     1 |       |     0   (0)|
|* 20 |             TABLE ACCESS BY INDEX ROWID| CODESGEN    |     1 |    10 |     1   (0)|
|  21 |        TABLE ACCESS FULL               | QCDUAL      |     1 |       |     3   (0)|
|* 22 |        INDEX RANGE SCAN                | DMSELEASI4  |     1 |    21 |     2   (0)|
|* 23 |        INDEX RANGE SCAN                | TNNTMVINI1  |     1 |    17 |     1   (0)|
|  24 |        TABLE ACCESS FULL               | QCDUAL      |     1 |       |     3   (0)|
-------------------------------------------------------------------------------------------

This is the good plan. This calls the function 25 times but has some of the not exists statements removed from the where clause.

Execution Plan
----------------------------------------------------------

----------------------------------------------------------------------------------------
| Id  | Operation                            | Name       | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |            |    25 | 54200 |   144   (0)|
|   1 |  FAST DUAL                           |            |     1 |       |     2   (0)|
|   2 |  FAST DUAL                           |            |     1 |       |     2   (0)|
|*  3 |  VIEW                                |            |    25 | 54200 |   144   (0)|
|*  4 |   COUNT STOPKEY                      |            |       |       |            |
|   5 |    VIEW                              |            |    26 | 56030 |   144   (0)|
|   6 |     COUNT                            |            |       |       |            |
|*  7 |      FILTER                          |            |       |       |            |
|   8 |       NESTED LOOPS ANTI              |            |    30 |  3210 |   144   (0)|
|   9 |        NESTED LOOPS OUTER            |            |    30 |  2580 |   114   (0)|
|  10 |         NESTED LOOPS ANTI            |            |    30 |  2220 |    84   (0)|
|  11 |          NESTED LOOPS ANTI           |            |    32 |  1824 |    52   (0)|
|  12 |           TABLE ACCESS BY INDEX ROWID| DEMISE     |   130K|  5979K|    18   (0)|
|  13 |            INDEX FULL SCAN           | DEMISEI4   |    34 |       |     3   (0)|
|* 14 |           INDEX RANGE SCAN           | CODESGENI3 |     1 |    10 |     1   (0)|
|* 15 |          INDEX RANGE SCAN            | TNNTMVINI1 |     1 |    17 |     1   (0)|
|* 16 |         INDEX RANGE SCAN             | USERDATAI1 |     1 |    12 |     1   (0)|
|* 17 |        INDEX RANGE SCAN              | DMSELEASI4 |     1 |    21 |     1   (0)|
|  18 |       TABLE ACCESS FULL              | QCDUAL     |     1 |       |     3   (0)|
----------------------------------------------------------------------------------------

I fully appreciate the second plan is doing less but that doesn't explain why the functions aren't being evaluated... at least not that I can work out.


Solution

  • The Pagination with ROWNUM may be performed in two ways:

    A) full scan the row source with optimized sorting (limited to the top N rows) or

    B) index access of the row source with no sort at all

    Here simplified example of case A

     SELECT *
     FROM
       (SELECT a.*,
         ROWNUM rnum
       FROM
         ( SELECT * FROM test_view_one ORDER BY id
         ) a
       WHERE ROWNUM <= 25
       )
     WHERE rnum >= 1
    

    The corresponding execution plan looks as follows (Note that I presend also part of column projection - I will soon explain why):

     -----------------------------------------------------------------------------------------
     | Id  | Operation                | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
     -----------------------------------------------------------------------------------------
     |   0 | SELECT STATEMENT         |      |    25 |   975 |       |  1034   (1)| 00:00:01 |
     |*  1 |  VIEW                    |      |    25 |   975 |       |  1034   (1)| 00:00:01 |
     |*  2 |   COUNT STOPKEY          |      |       |       |       |            |          |
     |   3 |    VIEW                  |      | 90000 |  2285K|       |  1034   (1)| 00:00:01 |
     |*  4 |     SORT ORDER BY STOPKEY|      | 90000 |   439K|  1072K|  1034   (1)| 00:00:01 |
     |   5 |      TABLE ACCESS FULL   | TEST | 90000 |   439K|       |   756   (1)| 00:00:01 |
     -----------------------------------------------------------------------------------------
    
    
     Column Projection Information (identified by operation id):
     -----------------------------------------------------------
     ... 
        3 - "A"."ID"[NUMBER,22], "A"."FUNCTION_COLUMN"[NUMBER,22]
        4 - (#keys=1) "ID"[NUMBER,22], "MY_PACKAGE"."MY_FUNCTION"("ID")[22]
        5 - "ID"[NUMBER,22]     
    

    Within the execution the table is accessed with FULL SCAN, i.e. all records are red. The optimization is in the SORT operation: SORT ORDER BY STOPKEY means that not all rows are sorted, but only the top 25 are kept and sortet.

    Here the execution plan for case B

     --------------------------------------------------------------------------------
     | Id  | Operation           | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
     --------------------------------------------------------------------------------
     |   0 | SELECT STATEMENT    |          |    25 |   975 |     2   (0)| 00:00:01 |
     |*  1 |  VIEW               |          |    25 |   975 |     2   (0)| 00:00:01 |
     |*  2 |   COUNT STOPKEY     |          |       |       |            |          |
     |   3 |    VIEW             |          |    26 |   676 |     2   (0)| 00:00:01 |
     |*  4 |     INDEX RANGE SCAN| TEST_IDX |    26 |   130 |     2   (0)| 00:00:01 |
     --------------------------------------------------------------------------------
    

    Here are accessed only the required 25 rows and therefore the function can't be called more that the N times.

    Now the important consideration, in case A, the function can, but need not be called for each row. How do we see it?

    The answer is in the column projection in the explain plan.

        4 - (#keys=1) "ID"[NUMBER,22], "MY_PACKAGE"."MY_FUNCTION"("ID")[22]
    

    The relevant line 4 show, that the function is called in the SORT operation and therefor for each line. (Sort gets all the rows).

    Conclusion

    My test on 11.2 shows that in case A (FULL SCAN with SORT ORDER BY STOPKEY) the view function is called once per each row. I guess the only workaround is to select only the ID, limit the result and than join back the original view to get the function value.

    Final notes

    I tested this in 12.1 as well and see below the shift in the column projection. The function is calculated first in the VIEW (line 3), i.e. both cases works fine.

     Column Projection Information (identified by operation id):
     -----------------------------------------------------------
     ...
        3 - "A"."ID"[NUMBER,22], "A"."FUNCTION_COLUMN"[NUMBER,22]
        4 - (#keys=1) "ID"[NUMBER,22]
        5 - "ID"[NUMBER,22]        
    

    And of course in 12c the new feature of OFFSET - FETCH NEXT could be used.

    Good Luck!