Search code examples
sqloracle11ghierarchical-datahierarchical-query

Oracle SQL hierarical query connected by sequential field


I have a scheme of hierarchical tasks such as this:

|-- task_1
|---- task_1_1
|------ task_1_1_1
|------ task_1_1_2
|---- task_1_2
|------ task_1_2_1
|------ task_1_2_2
|-- task_2
etc.

The number of tasks at each level can vary as can vary the depth of this hierarchy.

In the database they are stored as a sequence :

name sequence parent_sequence project_id
task_1 1 -1 1
task_1_1 2 1 1
task_1_1_1 3 2 1
task_1_1_2 4 2 1
task_1_2 5 1 1
task_1_2_1 6 5 1
task_1_2_2 7 5 1
task_2 8 -1 1
task_b_1 1 -1 2
task_b_1_1 2 1 2
task_b_1_1_1 3 2 2

To reproduce the hierarchy, I use this query

SELECT
    kid.name AS kid_name,
    parent.name AS parent_name
FROM
    task kid
    LEFT JOIN task parent ON parent.sequence = kid.parent_sequence 
                             AND parent.project_id = kid.parent_id

When I try to use a hierarchical query as such

SELECT
    task.name AS task_name,
    SYS_CONNECT_BY_PATH(task.name, '/') AS task_path
FROM
   task
START WITH task.parent_sequence = -1
CONNECT BY PRIOR task.sequence = task.parent_sequence 

it doesn't account for the condition parent_id = parent_id.

I tried to hardcode a WHERE parent_id = 1 and a LEFT JOIN to be able to do a AND kid.project_id = parent.project_id in the CONNECT BY PRIOR and other combinations, but each time, it builds the hierarchy on the integral of the records.

Matching task.sequence = task.parent_sequence, hence false duplicates like task_1/task_b_1_1.

The problem being that multiple records can hold the same parent_sequence and sequence (because they are calculated in the scope of one project_id).

I can not touch the db schematic.

How can I build a hierarchical query grouped by project_id?

Cheers


Solution

  • You can add the project_id comparison to the CONNECT BY filter:

    SELECT project_id,
           task.name AS task_name,
           SYS_CONNECT_BY_PATH(task.name, '/') AS task_path
    FROM   task
    START WITH task.parent_sequence = -1
    CONNECT BY
          PRIOR task.sequence = task.parent_sequence 
    AND   PRIOR project_id    = project_id
    ORDER SIBLINGS BY project_id, sequence
    

    Which, for the sample data:

    CREATE TABLE task (name, sequence, parent_sequence, project_id) AS
    SELECT 'task_1',       1, -1, 1 FROM DUAL UNION ALL
    SELECT 'task_1_1',     2,  1, 1 FROM DUAL UNION ALL
    SELECT 'task_1_1_1',   3,  2, 1 FROM DUAL UNION ALL
    SELECT 'task_1_1_2',   4,  2, 1 FROM DUAL UNION ALL
    SELECT 'task_1_2',     5,  1, 1 FROM DUAL UNION ALL
    SELECT 'task_1_2_1',   6,  5, 1 FROM DUAL UNION ALL
    SELECT 'task_1_2_2',   7,  5, 1 FROM DUAL UNION ALL
    SELECT 'task_2',       8, -1, 1 FROM DUAL UNION ALL
    SELECT 'task_b_1',     1, -1, 2 FROM DUAL UNION ALL
    SELECT 'task_b_1_1',   2,  1, 2 FROM DUAL UNION ALL
    SELECT 'task_b_1_1_1', 3,  2, 2 FROM DUAL;
    

    Outputs:

    PROJECT_ID TASK_NAME TASK_PATH
    1 task_1 /task_1
    1 task_1_1 /task_1/task_1_1
    1 task_1_1_1 /task_1/task_1_1/task_1_1_1
    1 task_1_1_2 /task_1/task_1_1/task_1_1_2
    1 task_1_2 /task_1/task_1_2
    1 task_1_2_1 /task_1/task_1_2/task_1_2_1
    1 task_1_2_2 /task_1/task_1_2/task_1_2_2
    1 task_2 /task_2
    2 task_b_1 /task_b_1
    2 task_b_1_1 /task_b_1/task_b_1_1
    2 task_b_1_1_1 /task_b_1/task_b_1_1/task_b_1_1_1

    db<>fiddle here