I am working with a dataset that looks like the following (did not create fiddle since it is so small)
I have a table tblReqs
, with a basic structure like this:
| Onum | Pnum | ReqNum |
|:----:|:----:|:------:|
| NULL | P427 | RN1148 |
| NULL | P324 | RN1725 |
| NULL | P229 | RN1242 |
| O396 | NULL | RN1457 |
| O380 | NULL | RN1205 |
| O258 | NULL | RN1482 |
I then have a table of straight numbers, called tblnums
, like this:
| nums |
|------|
| O258 |
| O370 |
| O490 |
| O314 |
| O379 |
| P341 |
| P230 |
| P280 |
| P324 |
| P395 |
I need to conditionally join tblnums
to tblReqs
. Either the Onum
or the Pnum
in tblReqs
would be equal to the nums
field in tblnums
.
The goal would be a dataset that looks like this:
| nums | ReqNum |
|:----:|:------:|
| O258 | RN1482 |
| O370 | |
| O490 | |
| O314 | |
| O379 | |
| P341 | |
| P230 | |
| P280 | |
| P324 | RN1725 |
| P395 | |
I know I can use the OR
operator in a join like so:
SELECT
tblnums.nums,
tblReqs.ReqNum
FROM
tblnums
LEFT JOIN tblReqs ON tblnums.nums = tblReqs.Onum OR tblnums.nums = tblReqs.Pnum
But I have recently learned about COALESCE
:
SELECT
tblnums.nums,
tblReqs.ReqNum
FROM
tblnums
LEFT JOIN tblReqs ON tblnums.nums = COALESCE(tblReqs.Onum, tblReqs.Pnum)
Is there a preference to one over the other? I understand I could try both on my dataset, but measuring time is not a good metric as it is subject to so many other constraints (DB maintenance, network bandwidth, cpu / ram power, etc). Additionally, I cannot pull many metrics through SQL Developer since my IT team has it locked down.
Which of the two methods is preferred for conditional joins, OR
or COALESCE
? What is happening from a run-time complexity perspective, etc?
There is an upside to using coalesce (or NVL for that matter) in that you can take advantage of function-based indexes, to improve query performance.
Setup:
FSITJA@db01 2019-07-18 09:21:33> create table tblreqs (onum, pnum, reqnum primary key) as
2 with t (onum, pnum, reqnum) as (
3 select NULL, 'P427', 'RN1148' from dual union all
4 select NULL, 'P324', 'RN1725' from dual union all
5 select NULL, 'P229', 'RN1242' from dual union all
6 select 'O396', NULL, 'RN1457' from dual union all
7 select 'O380', NULL, 'RN1205' from dual union all
8 select 'O258', NULL, 'RN1482' from dual
9 ) select * from t;
Table created.
FSITJA@db01 2019-07-18 09:21:33> create table tblnums (nums primary key) as
2 with t (nums)as (
3 select 'O258' from dual union all
4 select 'O370' from dual union all
5 select 'O490' from dual union all
6 select 'O314' from dual union all
7 select 'O379' from dual union all
8 select 'P341' from dual union all
9 select 'P230' from dual union all
10 select 'P280' from dual union all
11 select 'P324' from dual union all
12 select 'P395' from dual
13 ) select * from t;
Table created.
Please note how in the option below oracle cannot take advantage of indexes because of the conditional join (TABLE ACCESS STORAGE FULL on TBLREQS):
FSITJA@db01 2019-07-18 09:21:33> explain plan for
2 SELECT n.nums,
3 r.ReqNum
4 FROM tblnums n
5 LEFT JOIN tblReqs r ON n.nums = r.Onum OR n.nums = r.pnum;
Explained.
FSITJA@db01 2019-07-18 09:21:33> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1571794044
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 130 | 31 (0)| 00:00:01 |
| 1 | NESTED LOOPS OUTER | | 10 | 130 | 31 (0)| 00:00:01 |
| 2 | INDEX FULL SCAN | SYS_C0047401 | 10 | 50 | 1 (0)| 00:00:01 |
| 3 | VIEW | VW_LAT_EB747914 | 1 | 8 | 3 (0)| 00:00:01 |
|* 4 | TABLE ACCESS STORAGE FULL| TBLREQS | 1 | 13 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter("R"."ONUM" IS NOT NULL AND "N"."NUMS"="R"."ONUM" OR "R"."PNUM" IS NOT
NULL AND "N"."NUMS"="R"."PNUM")
17 rows selected.
Now if we create a function-based index using the coalesce function oracle detects that the index exists and will use it to improve the join performance, without need for a full table scan (INDEX RANGE SCAN on the newly created IDX_COALESCE_ONUM_PNUM):
FSITJA@db01 2019-07-18 09:21:33> create index idx_coalesce_onum_pnum on tblreqs (coalesce(Onum, Pnum));
Index created.
FSITJA@db01 2019-07-18 09:21:33> explain plan for
2 SELECT n.nums,
3 r.ReqNum
4 FROM tblnums n
5 LEFT JOIN tblReqs r ON n.nums = COALESCE(r.Onum, r.Pnum);
Explained.
FSITJA@db01 2019-07-18 09:21:33> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 605824869
---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 160 | 2 (0)| 00:00:01 |
| 1 | NESTED LOOPS OUTER | | 10 | 160 | 2 (0)| 00:00:01 |
| 2 | INDEX FULL SCAN | SYS_C0047401 | 10 | 50 | 1 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| TBLREQS | 1 | 11 | 1 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | IDX_COALESCE_ONUM_PNUM | 1 | | 0 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("N"."NUMS"="R"."SYS_NC00004$"(+))
16 rows selected.