Search code examples
sqloracle-databasehierarchical-dataoracle18c

Can "connect by" do running calculations when creating a hierarchy


I am trying to create a query that can create a hierarchy of occurrences in my data. However, since each production record can have several occurrences I need to create a weight for each hierarchy so I know which one is correct and can later filter for the specified result.

This weight would be the sum of the distance for each node defined as the sum of the absolute value of the difference between initiation and occurrence.

Had there been a perfect relationship between initiation and occurrence I would have used it as a criteria in the connect by statement, but as it's not the case I wondered if it is at all possible to sum the absolute value of each node so I can get an "accuracy" measure to use as a filtering column for further analysis?

Data

PROD MGRID init occ
100 NULL 2001/1/1 2016/4/12
100 NULL 2001/1/1 2017/2/11
100 NULL 2001/1/1 2017/7/19
101 100 2017/2/11 2017/2/21
102 100 2017/2/11 2017/2/11
103 100 2016/5/12 2016/5/12
201 103 2016/5/12 2016/5/12
202 101 2017/2/20 2017/2/21
203 100 2017/7/20 2017/7/19

Current Code

SELECT
    prod, mgrid, init, occ, level AS lvl, sys_connect_by_path(prod, '/') AS path
FROM mytab
WHERE level > 1
START WITH
    mgrid IS NULL
CONNECT BY
    PRIOR prod = mgrid
ORDER BY
    prod;

Solution

  • Use a recursive sub-query:

    WITH data (prod, mgrid, init, occ, lvl, path, weight) AS (
      SELECT prod,
             mgrid,
             init,
             occ,
             1,
             '/' || prod,
             ABS(occ-init)
      FROM   mytab
      WHERE  mgrid IS NULL
    UNION ALL
      SELECT m.prod,
             m.mgrid,
             m.init,
             m.occ,
             lvl + 1,
             d.path || '/' || m.prod,
             d.weight + ABS(m.occ - m.init)
      FROM   mytab m
             INNER JOIN data d
             ON (d.prod = m.mgrid)
    )
    SELECT *
    FROM   data
    WHERE  lvl > 1
    ORDER BY prod;
    

    Which, for your sample data:

    CREATE TABLE mytab (PROD, MGRID, init, occ) AS
    SELECT 100, NULL, DATE '2001-01-01', DATE '2016-04-12' FROM DUAL UNION ALL
    SELECT 100, NULL, DATE '2001-01-01', DATE '2017-02-11' FROM DUAL UNION ALL
    SELECT 100, NULL, DATE '2001-01-01', DATE '2017-07-19' FROM DUAL UNION ALL
    SELECT 101, 100,  DATE '2017-02-11', DATE '2017-02-21' FROM DUAL UNION ALL
    SELECT 102, 100,  DATE '2017-02-11', DATE '2017-02-11' FROM DUAL UNION ALL
    SELECT 103, 100,  DATE '2016-05-12', DATE '2016-05-12' FROM DUAL UNION ALL
    SELECT 201, 103,  DATE '2016-05-12', DATE '2016-05-12' FROM DUAL UNION ALL
    SELECT 202, 101,  DATE '2017-02-20', DATE '2017-02-21' FROM DUAL UNION ALL
    SELECT 203, 100,  DATE '2017-07-20', DATE '2017-07-19' FROM DUAL;
    

    Outputs:

    PROD MGRID INIT OCC LVL PATH WEIGHT
    101 100 11-FEB-17 21-FEB-17 2 /100/101 5590
    101 100 11-FEB-17 21-FEB-17 2 /100/101 6053
    101 100 11-FEB-17 21-FEB-17 2 /100/101 5895
    102 100 11-FEB-17 11-FEB-17 2 /100/102 5580
    102 100 11-FEB-17 11-FEB-17 2 /100/102 6043
    102 100 11-FEB-17 11-FEB-17 2 /100/102 5885
    103 100 12-MAY-16 12-MAY-16 2 /100/103 5580
    103 100 12-MAY-16 12-MAY-16 2 /100/103 6043
    103 100 12-MAY-16 12-MAY-16 2 /100/103 5885
    201 103 12-MAY-16 12-MAY-16 3 /100/103/201 5580
    201 103 12-MAY-16 12-MAY-16 3 /100/103/201 6043
    201 103 12-MAY-16 12-MAY-16 3 /100/103/201 5885
    202 101 20-FEB-17 21-FEB-17 3 /100/101/202 6054
    202 101 20-FEB-17 21-FEB-17 3 /100/101/202 5896
    202 101 20-FEB-17 21-FEB-17 3 /100/101/202 5591
    203 100 20-JUL-17 19-JUL-17 2 /100/203 5581
    203 100 20-JUL-17 19-JUL-17 2 /100/203 5886
    203 100 20-JUL-17 19-JUL-17 2 /100/203 6044

    db<>fiddle here