Search code examples
oraclequery-optimization

How can I improve the view?


this is the view

CREATE OR REPLACE FORCE EDITIONABLE VIEW "MDB7_RES"."V_NODE_FOR_MAPPING" 
(
    "ID", 
    "NODE_ID_FK", 
    "NODEID", 
    "NODENAME", 
    "SAMPLE_REPORT", 
    "EARLY_MDB", 
    "MODULEID", 
    "VERSION_NO", 
    "ITEM_NO", 
    "AI", 
    "SUPPLIER_NO", 
    "COUNT_ADDRESS", 
    "CHANGE_DATE", 
    "ROW_VERSION", 
    "ASSIGNMENT_DATE", 
    "ASSIGNMENT_ID_FK", 
    "ASSIGNMENT_MODE", 
    "MAPPING_DATE"
) 
AS 
SELECT 
    n.ID || a.ID AS ID, 
    n.ID AS node_id_fk, 
    n.nodeid, 
    n.nodename, 
    NVL(m.IS_PRELIMINARY, 0), 
    NVL(m.IS_EARLY , 0), 
    m.moduleid, 
    m.version_no, 
    t.ITEM_NO, 
    t.AI, 
    s.SUPPLIER_NO, 
    s.COUNT_ADDRESS, 
    n.change_date, 
    n.row_version, 
    a.ASSIGNMENT_DATE, 
    a.ID AS ASSIGNMENT_ID_FK, 
    a.ASSIGNMENT_MODE,
    (
        SELECT max(bm.CHANGE_DATE) 
        FROM T_BOM_MAPPING bm 
        JOIN T_BOM_POSITION bp ON bm.BOM_POSITION_ID_FK = bp.ID 
        WHERE bp.SACHNR = t.ITEM_NO
    ) AS MAPPING_DATE
FROM 
    T_NODE n 
    LEFT OUTER JOIN T_MODULE m ON  n.module_id_fk = m.ID 
    JOIN T_BMW_MDB_ASSIGNMENT a ON  a.node_id_fk = n.ID 
    JOIN T_TAIS_PART t ON a.part_id_fk = t.ID 
    JOIN T_GPS_SUPPLIER s ON a.supplier_id_fk = s.ID
WHERE 
    a.cancel_date IS NULL 
    AND (n.state_id_fk = 11 OR (m.IS_EARLY=1 AND n.STATE_ID_FK IN (10,11)))
    AND n.nodetype IN ('C', 'H', 'M');

I tried this version with different order of joining, But got no difference it was almost the same time nothing was changed

CREATE OR REPLACE FORCE EDITIONABLE VIEW "MDB7_RES"."V_NODE_FOR_MAPPING" AS
SELECT 
    n.ID || a.ID AS ID, 
    n.ID AS node_id_fk, 
    n.nodeid, 
    n.nodename, 
    COALESCE(m.IS_PRELIMINARY, 0) AS SAMPLE_REPORT, 
    COALESCE(m.IS_EARLY, 0) AS EARLY_MDB, 
    m.moduleid, 
    m.version_no, 
    t.ITEM_NO, 
    t.AI, 
    s.SUPPLIER_NO, 
    s.COUNT_ADDRESS, 
    n.change_date, 
    n.row_version, 
    a.ASSIGNMENT_DATE, 
    a.ID AS ASSIGNMENT_ID_FK, 
    a.ASSIGNMENT_MODE,
    (
        SELECT MAX(bm.CHANGE_DATE) 
        FROM T_BOM_MAPPING bm 
        JOIN T_BOM_POSITION bp ON bm.BOM_POSITION_ID_FK = bp.ID 
        WHERE bp.SACHNR = t.ITEM_NO
    ) AS MAPPING_DATE
FROM 
    T_BMW_MDB_ASSIGNMENT a 
    JOIN T_NODE n ON a.node_id_fk = n.ID 
    JOIN T_TAIS_PART t ON a.part_id_fk = t.ID 
    JOIN T_GPS_SUPPLIER s ON a.supplier_id_fk = s.ID
    LEFT OUTER JOIN T_MODULE m ON n.module_id_fk = m.ID 
WHERE 
    a.cancel_date IS NULL 
    AND (n.state_id_fk = 11 OR (m.IS_EARLY = 1 AND n.STATE_ID_FK IN (10, 11)))
    AND n.nodetype IN ('C', 'H', 'M');

It's working slowly, because of joins and that extra select as returning value. How can I make it faster ? Also it returns around 1.000.000 of rows So I guess with a lot of data and joins query takes a lot of time.

explain plan is

enter image description here


Solution

  • SQL is a declarative language. You tell the DBMS what data to find, not which algorithm to use to find the data. This means that changing the order of joins must not have any impact on the execution plan, because you still tell the DBMS to get the same data.

    Ideally that would apply to any change on a query - as long as you query the same data, the DBMS should come up with the same plan. In the real world the optimizer is not that smart that it can detect all query re-writes, but Oracle's optimizer is very good at this. Look at the execution plan. It seems the optimizers already sees that your

    SELECT
        ...,
        (
            SELECT max(bm.change_date) 
            FROM t_bom_mapping bm 
            JOIN t_bom_position bp ON bm.bom_position_id_fk = bp.id 
            WHERE bp.sachnr = t.item_no
        ) AS mapping_date
    FROM ...
    

    is the same as

    SELECT
        ...,
        xyz.mapping_date
    FROM ...
    LEFT OUTER JOIN
    (
        SELECT bp.sachnr, max(bm.change_date) AS mapping_date
        FROM t_bom_mapping bm 
        JOIN t_bom_position bp ON bm.bom_position_id_fk = bp.id 
        GROUP BY bp.sachnr
    ) xyz ON xyz.sachnr = t.item_no
    

    This means that usually there is no sense in re-writing a working query in order to get it faster.

    There are exceptions to that rule, because we know that the optimizer is not perfect. It may for instance be that the optimizer can guess very well how many rows it may access via n.state_id_fk = 11, but has problems to guess the amount of rows for (m.IS_EARLY=1 AND n.STATE_ID_FK IN (10,11)) or even the desired combination (n.state_id_fk = 11 OR (m.IS_EARLY=1 AND n.STATE_ID_FK IN (10,11))). In that case it may help to split the query in two parts and thus hinting the optimizer:

    SELECT ...
    WHERE a.cancel_date IS NULL AND n.nodetype IN ('C', 'H', 'M') AND n.state_id_fk = 11
    UNION ALL
    SELECT ...
    WHERE a.cancel_date IS NULL AND n.nodetype IN ('C', 'H', 'M') AND n.state_id_fk = 10 AND m.is_early = 1
    

    This may or may not help the optimizer to get to a better plan. And it may help now, but might become superfluous in a future version of the DBMS.

    What we rather do in order to speed up queries is to provide the appropriate indexes. In your query you have three main limiting criteria (as m.is_early = 1 only applies in a particular context):

    • a.cancel_date IS NULL
    • n.nodetype IN ('C', 'H', 'M')
    • n.state_id_fk IN (10, 11)

    Which is the most limiting criterion? An index makes sense when the accessed data is very small (maybe only 4% of the table rows or even less). Is this true for one of the criteria? Is almost all data canceled except for a very small part? Are the node types some rare types that only occur in 1 percent of the table? Are state IDs 10 and 11 super rare? none of them is really limiting, then the indexes won't help much, because almost all rows must be joined. If you have such a limiting factor, though, create the indexes.

    Let's say state IDs 10 and 11 are really rare. We'll first re-write the WHERE clause to make it plain that these apply to all rows (in case the optimizer doesn't see this itself):

    WHERE n.state_id_fk IN (10, 11)
      AND n.nodetype IN ('C', 'H', 'M')
      AND (n.state_id_fk = 11 OR m.is_early = 1)
      AND a.cancel_date IS NULL 
    

    We see our limiting criteria n.state_id_fk IN (10, 11) and even have criteria on the same table (n.nodetype IN ('C', 'H', 'M')), and we use these to find the t_node rows and especially their ID and their module_id_fk for the joins to the other tables. The index would hence be

    CREATE INDEX idx1 ON t_node (state_id_fk, nodetype, id, module_id_fk);
    

    To get from t_node to t_bmw_mdb_assignment and further from there using its part_id_fk and supplier_id_fk:

    CREATE INDEX idx2 ON t_bmw_mdb_assignment (node_id_fk, part_id_fk, supplier_id_fk);
    

    And to get to t_tais_part and t_gps_supplier you just need their primary key index on ID, which you should already have.