oracle-database query-optimization sql-view materialized-views

Oracle VERY slow when trying to create a table or a materialized view from 2 existing views

I have a classic Oracle view called STAFFACTIVE which contains around 9500 rows of 10 small native datatype fields (numbers & varchar2(100) only). This view is built over 2 tables coming from a DBLINK. Running in less than a second.

I have another view called DEPT_STAFFACTIVE which contains around 350 rows of 3 fields (1 number and 2 varchar(100)). This view is built over 1 table coming from a local schema. Running in a few milliseconds.

I have created a view joining these 2 views, no problem. Runs in ± 3 seconds :

CREATE OR REPLACE FORCE VIEW V_STAFF
(
    ID,
    LOGIN,
    FIRSTNAME,
    LASTNAME,
    SEARCH_FIRSTNAME,
    SEARCH_LASTNAME,
    PROPER_FIRSTNAME,
    PROPER_LASTNAME,
    EMAIL
)
    BEQUEATH DEFINER
AS
    SELECT pers_id                         AS id,
           logn_login                      AS login,
           rich_first_name            AS firstname,
           rich_family_name           AS lastname,
           search_first_name          AS search_firstname,
           search_family_name         AS search_lastname,
           INITCAP (rich_first_name)  AS proper_firstname,
           INITCAP (rich_family_name) AS proper_lastname,
           email
      FROM staffactive LEFT JOIN dept_staffactive ON staffactive.pers_id = dept_staffactive.pers_id;

But as 3 seconds is a bit too slow, I would like to use a TABLE or MATERIALIZED VIEW instead.

PROBLEM : If a try CREATE TABLE T_STAFF as SELECT * FROM V_STAFF or if I try to create a materialized view based on select * from V_STAFF, then it runs for more than one minute.

The same problem occurs when writing the JOIN in the statement as well create table t_staff as select * from staffactive left join dept_staffactive on staffactive.pers_id = dept_staffactive.pers_id

Is there any reason why? Where is it slowing? Is there a tool to analyse this?

I know it's a bit strange to build the view V_FINAL based on the 2 underlying VIEWS STAFFACTIVE and DEPT_STAFFACTIVE but as they are existing already, why not using them?

Solution

Your materialized view is actually creating a table (a heap segment), writing to the local database. Whenever you write to the database (DML or CTAS), Oracle must make the local database the driving site (and it will ignore attempts to change this with the driving_site hint). This totally changes the execution plan for distributed queries; it is therefore quite common that it performs well as a SELECT but not as a INSERT, CREATE TABLE AS... , MERGE, materialized view, etc... all of which write results to disk.

In this situation Oracle cannot ship the rows from your local view DEPT_STAFFACTIVE over to the remote side for a remote hash join where the remote optimizer can see all its statistics and perform a more efficient join if the remote is the larger of the two inputs. Instead, it has to pull the remote STAFFACTIVE data to the local side and perform the join locally using the optimizer's knowledge from the local side, which is more limited (this is the kind of thing that can change from version to version, of course, but historically this is a common problem). If it is unable to properly estimate row count because of this or because of view complexity (views are worse than tables), it might assume that the cost of moving that data over the network is too much and instead opt for a nested loops operation, which would execute the view code - again and again - for every row in the local DEPT_STAFFACTIVE.

Perhaps even more importantly, the changed execution plan may likely involve a merging of the views together so that each view isn't being executed as a unit as you'd expecte (this can happen with any view, not just ones involving remote tables). It could end up with a mish-mashed table join order taken from both views that produces a temporary many-to-many, or a Cartesian join, or any number of other suboptimal things. Views are hard for the optimizer to predict what to expect so mistakes are often made.

I suggest hinting to force a hash join:

create table t_staff as 
select /*+ no_merge(staffactive) no_merge(dept_staffactive) 
           use_hash(staffactive dept_staffactive) */ 
       [column list]
  from staffactive left join dept_staffactive on staffactive.pers_id = dept_staffactive.pers_id

(Note that your CTAS as written in your question cannot work because * would include pers_id from both views and you can't have two columns with the same name in the target table you're creating, so you'll need to list the columns, at least for one of the views.)

The no_merge hint prevents it from rewriting the view in unexpected ways that result in your other hints being invalid. (I almost always throw this hint in when a view is involved that is joined to something else, because if I use a view it's because I intend it to be run as a unit and view merging almost always does something I don't want it to do, not the least of which is causing other hints to be ignored). The use_hash tells it to read 100% of one view, 100% of the other view, then join them locally to prevent a nested loops situation. Given your tiny row count, that should work quite well.