Search code examples
sqloracleindexingoracle18coracle-spatial

Utilize a function-based index to precompute columns, while also selecting non-indexed columns


Test data:

create table lines (id number(38,0), 
                    details1 varchar2(10), 
                    details2 varchar2(10), 
                    details3 varchar2(10), 
                    shape sdo_geometry);
begin
    insert into lines (id, details1, details2, details3, shape) values (1, 'a', 'b', 'c', sdo_geometry(2002, 26917, null, sdo_elem_info_array(1, 2, 1), sdo_ordinate_array(574360, 4767080, 574200, 4766980)));
    insert into lines (id, details1, details2, details3, shape) values (2, 'a', 'b', 'c', sdo_geometry(2002, 26917, null, sdo_elem_info_array(1, 2, 1), sdo_ordinate_array(573650, 4769050, 573580, 4768870)));
    insert into lines (id, details1, details2, details3, shape) values (3, 'a', 'b', 'c', sdo_geometry(2002, 26917, null, sdo_elem_info_array(1, 2, 1), sdo_ordinate_array(574290, 4767090, 574200, 4767070)));
    insert into lines (id, details1, details2, details3, shape) values (4, 'a', 'b', 'c', sdo_geometry(2002, 26917, null, sdo_elem_info_array(1, 2, 1), sdo_ordinate_array(571430, 4768160, 571260, 4768040)));
    ...
end;
/

Full test data here: db<>fiddle


I want to pre-compute calculated columns via a function-based index.

Steps:

(1) Create functions to get the startpoint X & startpoint Y coordinates (numbers) from the SDO_GEOMETRY column :

create function startpoint_x(shape in sdo_geometry) return number 
deterministic is
begin
    return shape.sdo_ordinates(1);
end; 
   
create function startpoint_y(shape in sdo_geometry) return number 
deterministic is
begin
    return shape.sdo_ordinates(2);
end;  

select
    id,
    details1,
    details2,
    details3,
    startpoint_x(shape) as startpoint_x,
    startpoint_y(shape) as startpoint_y
from
    lines

        ID DETAILS1   DETAILS2   DETAILS3   STARTPOINT_X STARTPOINT_Y
---------- ---------- ---------- ---------- ------------ ------------
       177 a          b          c                574660      4766400
       178 a          b          c                574840      4765370
       179 a          b          c                573410      4768570
       180 a          b          c                573000      4767330
       ...

[full table scan]

(2) Create a FBI that stores the ID, startpoint X, and startpoint Y in a composite index:

create index lines_fbi_idx on lines (id, startpoint_x(shape), startpoint_y(shape))

(3) When I only select the indexed columns, the FBI gets invoked (no full table scan), which is good:

select
    id,
    startpoint_x(shape) as startpoint_x,
    startpoint_y(shape) as startpoint_y
from
    lines
where --https://stackoverflow.com/a/59581129/5576771
  id is not null
  and startpoint_x(shape) is not null
  and startpoint_y(shape) is not null

--------------------------------------------------------------------------------------
| Id  | Operation            | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |               |     3 |   117 |     4   (0)| 00:00:01 |
|*  1 |  INDEX FAST FULL SCAN| LINES_FBI_IDX |     3 |   117 |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------

PLAN_TABLE_OUTPUT                                                                                                                                                --------------
   1 - filter("ID" IS NOT NULL AND "INFRASTR"."STARTPOINT_X"("SHAPE") IS NOT 
              NULL AND "INFRASTR"."STARTPOINT_Y"("SHAPE") IS NOT NULL)
 
Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

That works as expected.

Note: That's just a bare-bones example for the purposes of this post. In reality, the custom functions would have more complicated logic and be a lot slower — hence the need for precomputing in an index.


Question:

In addition to selecting the indexed columns (ID, startpoint_x, and startpoint_y), I also want to select the non-indexed columns: details1,details2, and details3.

How can I utilize the function-based index for the precomputed columns, while also selecting the non-indexed columns?


Solution

  • I think I can do it by simply:

    1. Selecting only the indexed columns.
    2. Joining to the non-indexed columns.

    select
        a.id,
        a.startpoint_x,
        a.startpoint_y,
        b.details1,
        b.details2,
        b.details3
    from
        (
        select
            id,
            startpoint_x(shape) as startpoint_x,
            startpoint_y(shape) as startpoint_y
        from
            lines
        where --https://stackoverflow.com/a/59581129/5576771
          id is not null
          and startpoint_x(shape) is not null
          and startpoint_y(shape) is not null
        ) a
    left join
        lines b
        on a.id = b.id
    
    ------------------------------------------------------------------------------------------------------
    | Id  | Operation                            | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
    ------------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT                     |               |     3 |   219 |    10   (0)| 00:00:01 |
    |   1 |  NESTED LOOPS OUTER                  |               |     3 |   219 |    10   (0)| 00:00:01 |
    |*  2 |   INDEX FAST FULL SCAN               | LINES_FBI_IDX |     3 |   117 |     4   (0)| 00:00:01 |
    |   3 |   TABLE ACCESS BY INDEX ROWID BATCHED| LINES         |     1 |    34 |     2   (0)| 00:00:01 |
    |*  4 |    INDEX RANGE SCAN                  | LINES_FBI_IDX |     1 |       |     1   (0)| 00:00:01 |
    ------------------------------------------------------------------------------------------------------
    
    PLAN_TABLE_OUTPUT                                                                                                                                                --------------
     
    Predicate Information (identified by operation id):
    ---------------------------------------------------
     
       2 - filter("INFRASTR"."STARTPOINT_X"("SHAPE") IS NOT NULL AND 
                  "INFRASTR"."STARTPOINT_Y"("SHAPE") IS NOT NULL AND "ID" IS NOT NULL)
       4 - access("ID"="B"."ID"(+))
     
    Note
    -----
       - dynamic statistics used: dynamic sampling (level=2)
    

    I had incorrectly assumed that it would be more complicated than that. I thought I might need to force Oracle to invoke the FBI when selecting the indexed columns, while also doing a table scan to get the unindexed columns. I wondered if some sort of materialized hint would be needed. But it looks like the join did what I want without needing a hint.


    In general, the benefit of this overall non-spatial FBI solution is:

    It works around a limitation with function-based spatial indexes:

    From @DavidLapp of Oracle

    A spatial index is invoked only by the WHERE clause, not the SELECT list.

    Utilize function-based spatial index in SELECT list

    Whereas my non-spatial FBI doesn’t have that problem. It can be invoked by the SELECT list, as long as there are IS NOT NULLs in the where clause.

    Next step:

    In theory, maybe the resulting numeric XY columns could be "spatialized" after-the-fact:

    A) Convert the XY to SDO_GEOMETRY on-the-fly / after-the-fact in a query.

    or B) Use GIS software to display the XYs as points in the map — using a dynamic "layer" in the application. For example, use ArcGIS Pro's "XY Event Layer" functionality (which is surprisingly fast: https://gis.stackexchange.com/questions/380176/are-xy-event-layers-optimized-indexed-on-the-fly).

    Source: “Regarding the idea of indexing/returning a non-spatial datatype…”