Search code examples
sqloracleoptimizationquery-optimization

Conditionally joining - COALESCE vs OR - Oracle SQL


I am working with a dataset that looks like the following (did not create fiddle since it is so small)

I have a table tblReqs, with a basic structure like this:

| Onum | Pnum | ReqNum |
|:----:|:----:|:------:|
| NULL | P427 | RN1148 |
| NULL | P324 | RN1725 |
| NULL | P229 | RN1242 |
| O396 | NULL | RN1457 |
| O380 | NULL | RN1205 |
| O258 | NULL | RN1482 |

I then have a table of straight numbers, called tblnums, like this:

| nums |
|------|
| O258 |
| O370 |
| O490 |
| O314 |
| O379 |
| P341 |
| P230 |
| P280 |
| P324 |
| P395 |

I need to conditionally join tblnums to tblReqs. Either the Onum or the Pnum in tblReqs would be equal to the nums field in tblnums.

The goal would be a dataset that looks like this:

| nums | ReqNum |
|:----:|:------:|
| O258 | RN1482 |
| O370 |        |
| O490 |        |
| O314 |        |
| O379 |        |
| P341 |        |
| P230 |        |
| P280 |        |
| P324 | RN1725 |
| P395 |        |

I know I can use the OR operator in a join like so:

   SELECT
    tblnums.nums,
    tblReqs.ReqNum

FROM
    tblnums
        LEFT JOIN tblReqs ON tblnums.nums = tblReqs.Onum OR tblnums.nums = tblReqs.Pnum

But I have recently learned about COALESCE:

SELECT
    tblnums.nums,
    tblReqs.ReqNum

FROM
    tblnums
        LEFT JOIN tblReqs ON tblnums.nums = COALESCE(tblReqs.Onum, tblReqs.Pnum)

Is there a preference to one over the other? I understand I could try both on my dataset, but measuring time is not a good metric as it is subject to so many other constraints (DB maintenance, network bandwidth, cpu / ram power, etc). Additionally, I cannot pull many metrics through SQL Developer since my IT team has it locked down.

Which of the two methods is preferred for conditional joins, OR or COALESCE? What is happening from a run-time complexity perspective, etc?


Solution

  • There is an upside to using coalesce (or NVL for that matter) in that you can take advantage of function-based indexes, to improve query performance.

    Setup:

    FSITJA@db01 2019-07-18 09:21:33> create table tblreqs (onum, pnum, reqnum primary key) as
      2  with t (onum, pnum, reqnum) as (
      3  select NULL, 'P427', 'RN1148' from dual union all
      4  select NULL, 'P324', 'RN1725' from dual union all
      5  select NULL, 'P229', 'RN1242' from dual union all
      6  select 'O396', NULL, 'RN1457' from dual union all
      7  select 'O380', NULL, 'RN1205' from dual union all
      8  select 'O258', NULL, 'RN1482' from dual
      9  ) select * from t;
    
    Table created.
    
    FSITJA@db01 2019-07-18 09:21:33> create table tblnums (nums primary key) as
      2  with t (nums)as (
      3  select 'O258' from dual union all
      4  select 'O370' from dual union all
      5  select 'O490' from dual union all
      6  select 'O314' from dual union all
      7  select 'O379' from dual union all
      8  select 'P341' from dual union all
      9  select 'P230' from dual union all
     10  select 'P280' from dual union all
     11  select 'P324' from dual union all
     12  select 'P395' from dual
     13  ) select * from t;
    
    Table created.
    

    Please note how in the option below oracle cannot take advantage of indexes because of the conditional join (TABLE ACCESS STORAGE FULL on TBLREQS):

    FSITJA@db01 2019-07-18 09:21:33> explain plan for
      2  SELECT n.nums,
      3         r.ReqNum
      4    FROM tblnums n
      5    LEFT JOIN tblReqs r ON n.nums = r.Onum OR n.nums = r.pnum;
    
    Explained.
    
    FSITJA@db01 2019-07-18 09:21:33> select * from table(dbms_xplan.display);
    
    PLAN_TABLE_OUTPUT
    ------------------------------------------------------------------------------------------------------------------------------------
    Plan hash value: 1571794044
    
    -----------------------------------------------------------------------------------------------
    | Id  | Operation                   | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
    -----------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT            |                 |    10 |   130 |    31   (0)| 00:00:01 |
    |   1 |  NESTED LOOPS OUTER         |                 |    10 |   130 |    31   (0)| 00:00:01 |
    |   2 |   INDEX FULL SCAN           | SYS_C0047401    |    10 |    50 |     1   (0)| 00:00:01 |
    |   3 |   VIEW                      | VW_LAT_EB747914 |     1 |     8 |     3   (0)| 00:00:01 |
    |*  4 |    TABLE ACCESS STORAGE FULL| TBLREQS         |     1 |    13 |     3   (0)| 00:00:01 |
    -----------------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       4 - filter("R"."ONUM" IS NOT NULL AND "N"."NUMS"="R"."ONUM" OR "R"."PNUM" IS NOT
                  NULL AND "N"."NUMS"="R"."PNUM")
    
    17 rows selected.
    

    Now if we create a function-based index using the coalesce function oracle detects that the index exists and will use it to improve the join performance, without need for a full table scan (INDEX RANGE SCAN on the newly created IDX_COALESCE_ONUM_PNUM):

    FSITJA@db01 2019-07-18 09:21:33> create index idx_coalesce_onum_pnum on tblreqs (coalesce(Onum, Pnum));
    
    Index created.
    
    FSITJA@db01 2019-07-18 09:21:33> explain plan for
      2  SELECT n.nums,
      3         r.ReqNum
      4    FROM tblnums n
      5    LEFT JOIN tblReqs r ON n.nums = COALESCE(r.Onum, r.Pnum);
    
    Explained.
    
    FSITJA@db01 2019-07-18 09:21:33> select * from table(dbms_xplan.display);
    
    PLAN_TABLE_OUTPUT
    ------------------------------------------------------------------------------------------------------------------------------------
    Plan hash value: 605824869
    
    ---------------------------------------------------------------------------------------------------------------
    | Id  | Operation                            | Name                   | Rows  | Bytes | Cost (%CPU)| Time     |
    ---------------------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT                     |                        |    10 |   160 |     2   (0)| 00:00:01 |
    |   1 |  NESTED LOOPS OUTER                  |                        |    10 |   160 |     2   (0)| 00:00:01 |
    |   2 |   INDEX FULL SCAN                    | SYS_C0047401           |    10 |    50 |     1   (0)| 00:00:01 |
    |   3 |   TABLE ACCESS BY INDEX ROWID BATCHED| TBLREQS                |     1 |    11 |     1   (0)| 00:00:01 |
    |*  4 |    INDEX RANGE SCAN                  | IDX_COALESCE_ONUM_PNUM |     1 |       |     0   (0)| 00:00:01 |
    ---------------------------------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       4 - access("N"."NUMS"="R"."SYS_NC00004$"(+))
    
    16 rows selected.